Double Preconditioning (DoPr): Optimization for Test-Time Performance, not Validation Loss

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

Double Preconditioning (DoPr), introduced on June 4, 2026, is a novel optimization paradigm designed to improve test-time performance in deep learning applications plagued by "test-time feedback" (TTF). TTF describes the growing mismatch between training/validation loss and downstream metrics (e.g., task success, generation quality) when models roll out their own predictions, common in autoregressive language modeling, flow-based generative modeling, and robot policy learning. DoPr combines activation-wise preconditioning (AP), which encourages uniform feature learning by debiasing gradients from activation statistics, with standard gradient-wise preconditioning (GP) like Adam or Muon, which stabilizes and accelerates training. Experiments across continuous control (Humanoid-v5), image-based robot policy learning (Robomimic), and LLM fine-tuning (Llama-3.2-3B, Llama-3.1-8B) demonstrate that DoPr consistently boosts downstream performance, often without improving validation loss, highlighting a critical design space for optimizers beyond loss convergence.

Key takeaway

For Machine Learning Engineers developing models for autoregressive generation or sequential decision-making, you should consider adopting Double Preconditioning (DoPr) to enhance real-world task performance. Your validation loss may not accurately reflect downstream success in Test-Time Feedback (TTF) settings. DoPr offers a plug-in solution to improve feature learning and mitigate error accumulation, even if it doesn't always reduce your training loss. This allows you to optimize directly for critical downstream metrics like task success rate or generation quality.

Key insights

Test-time feedback (TTF) causes validation loss to misalign with downstream performance; DoPr mitigates this by improving feature learning.

Principles

Method

DoPr applies an activation-covariance preconditioner (AP) to the layer-wise gradient, then passes this AP-gradient to a gradient preconditioner (GP) like Adam or Muon, followed by a standard weight update.

In practice

Topics

Best for: Research Scientist, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.