Plan Before You Trade: Inference-Time Optimization for RL Trading Agents
Summary
A new framework called FPILOT (Financial Plugin Inference-time Learning for Optimal Trading) has been proposed to enhance reinforcement learning (RL) agents for portfolio management. Submitted on May 12, 2026, FPILOT addresses the limitation of static RL policies by integrating inference-time optimization using price forecasts. Inspired by Model Predictive Control (MPC), it leverages a predictive model to generate multi-step price trajectories, assuming an agent's portfolio allocation minimally impacts future prices. At each decision step, FPILOT constructs an allocation-based imagined return objective and optimizes the policy before executing a trade, without requiring agent retraining. Evaluated on the TradeMaster DJ30 benchmark across five policy learning algorithms, FPILOT consistently improved total return and risk-adjusted metrics like Sharpe, Sortino, and Calmar ratios, with stochastic policies showing greater benefits. Performance gains also correlated directly with the quality of synthetic forecasts.
Key takeaway
For quantitative analysts and AI scientists developing portfolio management systems, FPILOT offers a significant enhancement to existing RL trading agents. You should consider integrating this inference-time optimization framework to adapt your pre-trained policies to real-time price forecasts, potentially boosting total returns and improving risk-adjusted metrics like Sharpe and Sortino ratios without requiring costly retraining. Focus on improving your financial forecasting models, as higher forecast quality directly translates to better trading performance.
Key insights
FPILOT enhances RL trading agents by integrating inference-time optimization with price forecasts, improving returns and risk metrics.
Principles
- Future prices are largely independent of one agent's actions.
- Inference-time optimization can adapt pre-trained policies.
- Forecaster quality directly correlates with trading performance gains.
Method
FPILOT uses a predictive model to generate multi-step price trajectories, then optimizes the pre-trained policy at inference-time based on an imagined return objective before executing a single trade step.
In practice
- Apply FPILOT to existing RL trading agents.
- Prioritize high-quality financial forecasting models.
- Consider FPILOT for stochastic policy improvements.
Topics
- FPILOT
- Reinforcement Learning
- Portfolio Management
- Inference-time Optimization
- Model Predictive Control
Best for: AI Scientist, Data Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.