Run Local AI Agents with Faster Models and Multi-Node Clustering on NVIDIA DGX Spark

2026-06-01 · Source: NVIDIA Technical Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Cloud Computing & IT Infrastructure · Depth: Intermediate, medium

Summary

NVIDIA announced significant updates for its DGX Spark system at Computex 2026, aiming to simplify local AI agent development and deployment. These enhancements include a streamlined NemoClaw installation path, enabling developers to run autonomous agents in minutes after initial model download. NemoClaw, an open-source blueprint, integrates open models, an agent harness like OpenClaw, and the NVIDIA OpenShell runtime for secure, sandboxed execution. Furthermore, performance improvements for agentic models like Qwen3.6-35B on vLLM, utilizing NVIDIA's NVFP4 quantized checkpoint and MTP optimizations, deliver up to 2.6x faster inference. For scaling beyond a single device, NVIDIA Sync now features a cluster assistant that automates connecting two to four DGX Spark units, providing up to 512 GB of unified memory for large MoE models or multi-agent pipelines.

Key takeaway

For AI Engineers and MLOps teams building autonomous agents with strict privacy or performance needs, NVIDIA's DGX Spark updates significantly streamline your workflow. You can now deploy local AI agents in minutes using the simplified NemoClaw installation, benefiting from up to 2.6x faster inference with models like Qwen3.6-35B. If your projects demand greater compute, utilize NVIDIA Sync's cluster assistant to easily scale your DGX Spark units, providing ample memory for larger models and multi-agent pipelines.

Key insights

Integrated hardware and software solutions significantly accelerate local AI agent development and deployment, enhancing security and scalability.

Principles

Local agent execution improves security and privacy by keeping context on-device.
Simplified installation workflows drastically cut setup time for AI agent systems.
Automated multi-node clustering enables scaling local compute for large models.

Method

Install NemoClaw on DGX Spark via a single `curl` command, followed by an interactive wizard to accept licenses, perform express install, and automatically download Qwen3.6-35B and set up Ollama.

In practice

Access agent WebUI using `nemoclaw gateway-token --quiet` for interaction.
Customize agent behavior by editing system prompts and OpenShell network policies.
Use `nemoclaw onboard --fresh --gpu` to create new sandboxes with different models.

Topics

AI Agents
NVIDIA DGX Spark
NemoClaw
Local AI
Multi-node Clustering
Inference Optimization
Qwen3.6-35B

Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by NVIDIA Technical Blog.