NVIDIA Releases Polar, a Token-Faithful Rollout Framework for GRPO Training Across Codex, Claude Code, and Qwen Code

· Source: Machine Learning ML & Generative AI News · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

NVIDIA has released Polar, a token-faithful rollout framework designed for GRPO training across various large language models, including Codex, Claude Code, and Qwen Code. Polar simplifies reinforcement learning system integration by treating the agent harness as a black box, intercepting at the model API call boundary. It supports Anthropic Messages, OpenAI Chat Completions, OpenAI Responses, and Google generateContent APIs without requiring harness code changes. The framework achieves a 5.39x speedup and 87.7% GPU utilization using its `prefix_merging` strategy compared to `per_request`. SWE-Bench verified results show significant gains, with Codex improving from 3.8% to 26.4% (+22.6 pts). Polar also functions as a distributed data generation service, producing 504 accepted SFT trajectories from 1,638 SWE-Gym attempts in approximately 64 GPU-hours using Qwen3.5-122B-A10B on 8xH100.

Key takeaway

For Machine Learning Engineers integrating LLMs into reinforcement learning systems, Polar significantly reduces integration complexity and improves training efficiency. You should evaluate its proxy design and `prefix_merging` strategy to accelerate GRPO training and data generation, especially when working with diverse LLM APIs and aiming for higher GPU utilization. This framework offers a practical approach to streamline your development workflow.

Key insights

Polar simplifies RL system integration by abstracting the agent harness via a model API proxy.

Principles

Method

Polar employs a provider-compatible proxy between the agent harness and inference server, intercepting model API calls to reconstruct token-faithful trajectories for GRPO training across diverse LLMs.

In practice

Topics

Code references

Best for: MLOps Engineer, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning ML & Generative AI News.