Develop Physical AI Reasoning, World, and Action Models with NVIDIA Cosmos 3

2026-06-01 · Source: NVIDIA Technical Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Cloud Computing & IT Infrastructure · Depth: Expert, long

Summary

NVIDIA has released Cosmos 3, a frontier foundation model for physical AI that unifies physical reasoning, world generation, and action generation within a single open model. This release includes open-source Cosmos 3 models, training scripts, deployment tools, and six synthetic data generation datasets for applications like robotics and autonomous driving. The architecture features a Mixture-of-Transformers with a 16B-parameter Cosmos 3 Nano for workstation-grade compute (NVIDIA RTX PRO 6000 GPU) and a 64B-parameter Cosmos 3 Super for datacenter deployment (NVIDIA Hopper/Blackwell GPUs). Cosmos 3 supports various input/output modalities and leads on benchmarks such as VANTAGE-Bench and PAI-Bench. NVIDIA also introduced the Cosmos Human Evaluation (HUE) framework for objective video generation quality assessment. Training recipes for Supervised Fine-Tuning and action post-training are provided, alongside deployment options via NVIDIA NIM microservices, which offer optimizations like NVFP4 quantization and vLLM for efficient inference.

Key takeaway

For Machine Learning Engineers developing physical AI systems, NVIDIA Cosmos 3 offers a unified, open-source foundation model to streamline development. You can utilize its 16B-parameter Nano model for efficient edge inference or the 64B-parameter Super model for high-quality datacenter workloads. Leverage the provided training recipes for Supervised Fine-Tuning with your custom data and deploy optimized models using NVIDIA NIM microservices to accelerate your projects.

Key insights

Cosmos 3 unifies physical AI reasoning, world generation, and action generation into a single, open foundation model.

Principles

Unify reasoning and generation for simplified physical AI development.
Open-source models and datasets foster reproducibility.
Objective human evaluation improves model quality assessment.

Method

Cosmos 3 employs a Mixture-of-Transformers with a VLM-based Reasoner tower for interpretation and a diffusion-based Generator tower for physics-aware video and action sequence output.

In practice

Use Cosmos 3 Nano (16B) for real-time robotics on NVIDIA RTX PRO 6000.
Deploy Cosmos 3 via NIM microservices for optimized inference.
Adapt Cosmos 3 with SFT using custom video or action-labeled data.

Topics

Physical AI
NVIDIA Cosmos 3
Foundation Models
Robotics
Autonomous Vehicles
Synthetic Data Generation
NVIDIA NIM Microservices

Code references

Best for: AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by NVIDIA Technical Blog.