Build Personal AI Agents on Windows PCs with New Tools from Microsoft and NVIDIA
Summary
NVIDIA and Microsoft unveiled new tools and collaborations at NVIDIA GTC Taipei at COMPUTEX 2026 and Microsoft Build 2026, advancing on-device AI agent development for Windows PCs. This initiative includes turnkey agent sandboxing, faster agentic inference, and enhanced multi-GPU support for frameworks like llama.cpp and ComfyUI. Security is a core focus, with Microsoft eXecution Containers (MXC) and NVIDIA OpenShell providing policy-based isolation and PII obfuscation. New hardware, such as NVIDIA RTX Spark PCs offering 1 petaflop AI power and up to 128 GB memory, and the Surface RTX Spark Dev Box, are tailored for personal AI. Agent capabilities are expanded through NVIDIA NemoClaw, Hermes Agent's native Windows app, and H Company's Holo 3.1 models, which achieve 35% lower memory via quantization. Inference performance is boosted significantly for models in llama.cpp (up to 2x) and vLLM (2.6x). Multi-GPU support in llama.cpp (up to ~1.8x compute) and ComfyUI (up to 2x compute with CFG) further scales performance, complemented by GPU-accelerated Windows AI APIs and media SDK.
Key takeaway
For AI Engineers building on-device agents for Windows, these new tools from NVIDIA and Microsoft significantly enhance security and performance. You should integrate Microsoft eXecution Containers (MXC) and NVIDIA OpenShell to mitigate prompt injection risks and ensure safe agent operation. Capitalize on the performance gains from updated llama.cpp and vLLM, and explore multi-GPU support in LM Studio or ComfyUI to run larger models and accelerate workflows on RTX PCs. Consider the Surface RTX Spark Dev Box for a pre-configured development environment.
Key insights
NVIDIA and Microsoft are enabling secure, high-performance on-device AI agent development on Windows PCs.
Principles
- On-device agents need strong security primitives.
- Multi-GPU configurations scale local AI performance.
- Quantization and speculative decoding boost efficiency.
Method
Microsoft eXecution Containers (MXC) define isolation policies, integrated via NVIDIA OpenShell for secure agent deployment. Multi-Token Prediction (MTP) and Programmatic Dependent Launch (PDL) accelerate inference. Tensor parallelism and Classifier-Free Guidance (CFG) enable multi-GPU scaling.
In practice
- Implement MXC and OpenShell for agent security.
- Use llama.cpp or vLLM for accelerated inference.
- Enable tensor parallelism in LM Studio for multi-GPU.
Topics
- AI Agents
- On-device AI
- Windows Development
- NVIDIA RTX
- Multi-GPU Inference
- Agent Security
Code references
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, Machine Learning Engineer, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by NVIDIA Technical Blog.