Solving Path of Exile item crafting with Reinforcement Learning

2024-07-12 · Source: Denny's Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Gaming & Interactive Media · Depth: Intermediate, extended

Summary

This article, published July 13, 2024, details a model-based Reinforcement Learning (RL) approach to optimize item crafting in the complex ARPG, Path of Exile (PoE). It addresses the challenge of finding optimal action sequences for crafting desired items, a process often involving hundreds of stochastic steps and various crafting currencies. The author formalizes the problem as a Markov Decision Process (MDP) and develops a compact feature vector representation for item states, capturing target modifiers and item fullness. Unlike traditional game tree search methods like Minimax or MCTS, which struggle with PoE's cyclic graph and large stochastic branching factors, the RL approach learns a model of the crafting dynamics in feature space. This model is then used with Q-value iteration to derive optimal policies, demonstrating successful application to complex crafting scenarios, including a Boneshatter Axe and a Magic Jewel, optimizing for both path length (fewest steps) and cost (cheapest currency).

Key takeaway

For AI Engineers or Data Scientists tackling complex, stochastic optimization problems with known but intractable dynamics, consider a model-based Reinforcement Learning approach. Your team should prioritize robust feature engineering and model learning over direct policy learning, especially when environment dynamics can be sampled. This strategy can yield efficient, tunable solutions for complex sequences, even in domains with large state spaces and high branching factors.

Key insights

Reinforcement Learning effectively optimizes complex, stochastic crafting processes in Path of Exile by modeling environmental dynamics.

Principles

Model-based RL outperforms model-free for known, complex dynamics.
Feature engineering is critical for state representation in RL.
Reward functions dictate optimization goals (e.g., speed vs. cost).

Method

The method involves formalizing crafting as an MDP, designing a compact feature vector for item states, learning a transition model via sampling, and solving the MDP using Q-value iteration to derive optimal crafting policies.

In practice

Use a constant -1 reward for shortest path optimization.
Define rewards as negative action costs for cheapest crafting.
Employ heuristics to prune action space and improve efficiency.

Topics

Reinforcement Learning
Path of Exile Crafting
Markov Decision Processes
Q-value Iteration
Game Item Modifiers

Code references

lightvector/KataGo

Best for: Machine Learning Engineer, AI Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Denny's Blog.