Build Personal AI Agents on Windows PCs with New Tools from Microsoft and NVIDIA

2026-06-02 · Source: NVIDIA Technical Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Robotics & Autonomous Systems · Depth: Intermediate, medium

Summary

NVIDIA and Microsoft unveiled new tools and collaborations at NVIDIA GTC Taipei at COMPUTEX 2026 and Microsoft Build 2026, advancing on-device AI agent development for Windows PCs. This initiative includes turnkey agent sandboxing, faster agentic inference, and enhanced multi-GPU support for frameworks like llama.cpp and ComfyUI. Security is a core focus, with Microsoft eXecution Containers (MXC) and NVIDIA OpenShell providing policy-based isolation and PII obfuscation. New hardware, such as NVIDIA RTX Spark PCs offering 1 petaflop AI power and up to 128 GB memory, and the Surface RTX Spark Dev Box, are tailored for personal AI. Agent capabilities are expanded through NVIDIA NemoClaw, Hermes Agent's native Windows app, and H Company's Holo 3.1 models, which achieve 35% lower memory via quantization. Inference performance is boosted significantly for models in llama.cpp (up to 2x) and vLLM (2.6x). Multi-GPU support in llama.cpp (up to ~1.8x compute) and ComfyUI (up to 2x compute with CFG) further scales performance, complemented by GPU-accelerated Windows AI APIs and media SDK.

Key takeaway

For AI Engineers building on-device agents for Windows, these new tools from NVIDIA and Microsoft significantly enhance security and performance. You should integrate Microsoft eXecution Containers (MXC) and NVIDIA OpenShell to mitigate prompt injection risks and ensure safe agent operation. Capitalize on the performance gains from updated llama.cpp and vLLM, and explore multi-GPU support in LM Studio or ComfyUI to run larger models and accelerate workflows on RTX PCs. Consider the Surface RTX Spark Dev Box for a pre-configured development environment.

Key insights

NVIDIA and Microsoft are enabling secure, high-performance on-device AI agent development on Windows PCs.

Principles

On-device agents need strong security primitives.
Multi-GPU configurations scale local AI performance.
Quantization and speculative decoding boost efficiency.

Method

Microsoft eXecution Containers (MXC) define isolation policies, integrated via NVIDIA OpenShell for secure agent deployment. Multi-Token Prediction (MTP) and Programmatic Dependent Launch (PDL) accelerate inference. Tensor parallelism and Classifier-Free Guidance (CFG) enable multi-GPU scaling.

In practice

Implement MXC and OpenShell for agent security.
Use llama.cpp or vLLM for accelerated inference.
Enable tensor parallelism in LM Studio for multi-GPU.

Topics

AI Agents
On-device AI
Windows Development
NVIDIA RTX
Multi-GPU Inference
Agent Security

Code references

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, Machine Learning Engineer, Software Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by NVIDIA Technical Blog.