Solving Path of Exile item crafting with Reinforcement Learning
Summary
This article, published July 13, 2024, details a model-based Reinforcement Learning (RL) approach to optimize item crafting in the complex ARPG, Path of Exile (PoE). It addresses the challenge of finding optimal action sequences for crafting desired items, a process often involving hundreds of stochastic steps and various crafting currencies. The author formalizes the problem as a Markov Decision Process (MDP) and develops a compact feature vector representation for item states, capturing target modifiers and item fullness. Unlike traditional game tree search methods like Minimax or MCTS, which struggle with PoE's cyclic graph and large stochastic branching factors, the RL approach learns a model of the crafting dynamics in feature space. This model is then used with Q-value iteration to derive optimal policies, demonstrating successful application to complex crafting scenarios, including a Boneshatter Axe and a Magic Jewel, optimizing for both path length (fewest steps) and cost (cheapest currency).
Key takeaway
For AI Engineers or Data Scientists tackling complex, stochastic optimization problems with known but intractable dynamics, consider a model-based Reinforcement Learning approach. Your team should prioritize robust feature engineering and model learning over direct policy learning, especially when environment dynamics can be sampled. This strategy can yield efficient, tunable solutions for complex sequences, even in domains with large state spaces and high branching factors.
Key insights
Reinforcement Learning effectively optimizes complex, stochastic crafting processes in Path of Exile by modeling environmental dynamics.
Principles
- Model-based RL outperforms model-free for known, complex dynamics.
- Feature engineering is critical for state representation in RL.
- Reward functions dictate optimization goals (e.g., speed vs. cost).
Method
The method involves formalizing crafting as an MDP, designing a compact feature vector for item states, learning a transition model via sampling, and solving the MDP using Q-value iteration to derive optimal crafting policies.
In practice
- Use a constant -1 reward for shortest path optimization.
- Define rewards as negative action costs for cheapest crafting.
- Employ heuristics to prune action space and improve efficiency.
Topics
- Reinforcement Learning
- Path of Exile Crafting
- Markov Decision Processes
- Q-value Iteration
- Game Item Modifiers
Code references
Best for: Machine Learning Engineer, AI Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Denny's Blog.