Build Personal AI Agents on Windows PCs with New Tools from Microsoft and NVIDIA

· Source: NVIDIA Technical Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Robotics & Autonomous Systems · Depth: Intermediate, medium

Summary

NVIDIA and Microsoft unveiled new tools and collaborations at NVIDIA GTC Taipei at COMPUTEX 2026 and Microsoft Build 2026, advancing on-device AI agent development for Windows PCs. This initiative includes turnkey agent sandboxing, faster agentic inference, and enhanced multi-GPU support for frameworks like llama.cpp and ComfyUI. Security is a core focus, with Microsoft eXecution Containers (MXC) and NVIDIA OpenShell providing policy-based isolation and PII obfuscation. New hardware, such as NVIDIA RTX Spark PCs offering 1 petaflop AI power and up to 128 GB memory, and the Surface RTX Spark Dev Box, are tailored for personal AI. Agent capabilities are expanded through NVIDIA NemoClaw, Hermes Agent's native Windows app, and H Company's Holo 3.1 models, which achieve 35% lower memory via quantization. Inference performance is boosted significantly for models in llama.cpp (up to 2x) and vLLM (2.6x). Multi-GPU support in llama.cpp (up to ~1.8x compute) and ComfyUI (up to 2x compute with CFG) further scales performance, complemented by GPU-accelerated Windows AI APIs and media SDK.

Key takeaway

For AI Engineers building on-device agents for Windows, these new tools from NVIDIA and Microsoft significantly enhance security and performance. You should integrate Microsoft eXecution Containers (MXC) and NVIDIA OpenShell to mitigate prompt injection risks and ensure safe agent operation. Capitalize on the performance gains from updated llama.cpp and vLLM, and explore multi-GPU support in LM Studio or ComfyUI to run larger models and accelerate workflows on RTX PCs. Consider the Surface RTX Spark Dev Box for a pre-configured development environment.

Key insights

NVIDIA and Microsoft are enabling secure, high-performance on-device AI agent development on Windows PCs.

Principles

Method

Microsoft eXecution Containers (MXC) define isolation policies, integrated via NVIDIA OpenShell for secure agent deployment. Multi-Token Prediction (MTP) and Programmatic Dependent Launch (PDL) accelerate inference. Tensor parallelism and Classifier-Free Guidance (CFG) enable multi-GPU scaling.

In practice

Topics

Code references

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, Machine Learning Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by NVIDIA Technical Blog.