The World Leaks the Future: Harness Evolution for Future Prediction Agents
Summary
Milkyway is a self-evolving agent system designed for future prediction tasks, where outcomes are unknown at prediction time and public evidence evolves. Unlike traditional methods that primarily learn from final outcomes, Milkyway keeps its base model fixed and instead updates a persistent "future prediction harness." This harness manages factor tracking, evidence gathering and interpretation, and uncertainty handling. The system extracts "internal feedback" by comparing earlier and later predictions on the same unresolved question, using temporal contrasts to identify omissions and improve the harness before the outcome is known. After a question resolves, the final outcome provides a "retrospective check" to validate harness updates before they are applied to subsequent questions. Milkyway achieved the best overall scores on the FutureX and FutureWorld benchmarks, improving FutureX from 44.07 to 60.90 and FutureWorld from 62.22 to 77.96.
Key takeaway
For research scientists developing forecasting agents, you should consider implementing a self-evolving system like Milkyway. By leveraging "internal feedback" from repeated predictions on unresolved questions to update a persistent "future prediction harness," you can significantly improve prediction accuracy before final outcomes are known, complementing traditional outcome-based learning.
Key insights
Internal feedback from temporal prediction contrasts can improve future prediction agents before outcomes are known.
Principles
- Harness evolution improves agent performance.
- Internal feedback guides pre-resolution updates.
- Final outcomes provide retrospective checks.
Method
Milkyway updates a persistent "future prediction harness" using internal feedback from temporal contrasts across repeated predictions on unresolved questions, with final outcomes serving as retrospective checks.
In practice
- Implement a persistent external harness for agent guidance.
- Extract internal feedback from temporal prediction differences.
- Use final outcomes to validate provisional updates.
Topics
- Milkyway System
- Future Prediction Agents
- Internal Feedback
- Future Prediction Harness
- Temporal Contrasts
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.