Holotron-12B - High Throughput Computer Use Agent
Summary
H Company has released Holotron-12B, a 12-billion parameter multimodal computer-use model, post-trained from the open NVIDIA Nemotron-Nano-2 VL model on proprietary data, specifically optimized for high-throughput inference and performance in production agentic workloads. Its hybrid State-Space Model (SSM) and attention architecture enables over 2x higher throughput compared to Holo2-8B on a single H100 GPU, achieving 8.9k tokens/s at 100 concurrency due to efficient VRAM utilization. Trained on approximately 14 billion tokens, Holotron-12B significantly improved WebVoyager performance from 35.1% to 80.5% and showed strong gains on localization benchmarks like OS-World-G, GroundUI, and WebClick. This model is available on Hugging Face under an NVIDIA Open Model License, proving the Nemotron VL model's strong foundation for real-world multimodal agents. H Company plans to leverage the newly announced NVIDIA Nemotron 3 Omni to further scale agentic intelligence for commercial "computer use" deployments.
Key takeaway
Holotron-12B, a 12B multimodal computer-use agent model post-trained from NVIDIA Nemotron-Nano-2 VL, leverages a hybrid State-Space Model (SSM) and attention architecture for high-throughput inference. It achieves over 2x higher throughput (8.9k tokens/s at 100 concurrency on H100) and boosts WebVoyager agent performance from 35.1% to 80.5% compared to Holo2-8B. This makes it ideal for throughput-bound agentic workloads like data generation and online reinforcement learning, enabling efficient scaling for real-world autonomous computer-use deployments.
Topics
- Holotron-12B
- Multimodal Models
- State-Space Models
- Agentic AI
- NVIDIA Nemotron
Best for: AI Architect, AI Engineer, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Hugging Face - Blog.