NVIDIA brings agents to life with DGX Spark and Reachy Mini

2024-07-29 · Source: Hugging Face - Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Software Development & Engineering · Depth: Intermediate, long

Summary

NVIDIA, at CES 2026, showcased the creation of real-world AI agents using its DGX Spark workstation and Pollen Robotics' Reachy Mini robot. This demonstration highlighted the integration of NVIDIA's open models, including Nemotron reasoning LLMs, Isaac GR00T N1.6 VLA, and Cosmos world foundation models, to enable interactive, desk-side AI companions. The article provides a step-by-step guide to replicate this setup, detailing the use of NVIDIA Nemotron 3 Nano for reasoning, Nemotron Nano 2 VL for vision, and ElevenLabs for text-to-speech. It outlines three deployment options: local on DGX Spark (requiring ~65GB disk for reasoning and ~28GB for vision models), cloud via NVIDIA Brev or Hugging Face Inference Endpoints, or serverless model endpoints. The system leverages the NeMo Agent Toolkit for orchestration and Pipecat for real-time voice and multimodal interactions, allowing users to build customizable, private, and hackable physical AI agents.

Key takeaway

For AI Engineers and Machine Learning Engineers building interactive robotic systems, this guide demonstrates a concrete path to creating physical AI agents. You should consider adopting NVIDIA's open model ecosystem and the NeMo Agent Toolkit to build customizable, private agents. This approach allows full control over intelligence and hardware, moving beyond "black-box" assistants and enabling local inspection and extension of agent capabilities.

Key insights

NVIDIA's DGX Spark and Reachy Mini enable building private, customizable, real-world AI agents using open models.

Principles

Route queries to specialized models based on intent.
Keep tool schemas tight for agent decision-making.
Implement "confirm before actuation" for physical robot safety.

Method

The method involves integrating NVIDIA Nemotron models for reasoning and vision, ElevenLabs for TTS, NeMo Agent Toolkit for orchestration, and Pipecat for real-time multimodal interaction, all running on DGX Spark with Reachy Mini hardware or simulation.

In practice

Use NVIDIA Nemotron 3 Nano for reasoning tasks.
Employ NVIDIA Nemotron Nano 2 VL for visual understanding.
Deploy via NVIDIA NIM or vLLM for self-hosting models.

Topics

AI Agents
NVIDIA DGX Spark
Reachy Mini
NVIDIA NeMo Agent Toolkit
Multimodal AI

Code references

Best for: AI Engineer, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Hugging Face - Blog.