Mahjax: A GPU-Accelerated Mahjong Simulator for Reinforcement Learning in JAX
Summary
Mahjax is a new GPU-accelerated Riichi Mahjong simulator developed in JAX, designed to facilitate reinforcement learning research, particularly for *tabula rasa* learning approaches. Riichi Mahjong is a multi-player, imperfect-information game with high stochasticity and complex state spaces, making it a challenging environment for AI. Unlike prior methods relying on supervised learning from human play logs, Mahjax supports learning from scratch by enabling large-scale rollout parallelization on GPUs. The simulator demonstrates impressive performance, achieving throughputs of up to 2 million steps per second under no-red rules and 1 million steps per second under red rules, utilizing eight NVIDIA A100 GPUs. It also includes a high-quality visualization tool for debugging. Experimental results confirm its utility, showing that agents can be effectively trained to improve their rank against baseline policies.
Key takeaway
For Machine Learning Engineers developing reinforcement learning agents for complex, imperfect-information games, Mahjax offers a critical tool. You can now accelerate *tabula rasa* learning research using its GPU-accelerated JAX environment, achieving millions of simulation steps per second. This enables faster iteration and more robust agent training, potentially leading to breakthroughs in domains mirroring real-world decision-making challenges. Consider integrating Mahjax for large-scale experimentation and benchmarking.
Key insights
Mahjax provides a high-throughput, GPU-accelerated JAX environment for *tabula rasa* reinforcement learning in complex imperfect-information games.
Principles
- Vectorized JAX environments enable massive RL parallelization.
- Imperfect-information games challenge *tabula rasa* learning.
- High-throughput simulation accelerates algorithm development.
Method
The article describes the implementation of Mahjax in JAX for vectorized Riichi Mahjong simulation, enabling GPU-accelerated parallel rollouts and agent training against baselines.
In practice
- Train RL agents for complex imperfect-information games.
- Debug agent behavior using the visualization tool.
- Benchmark *tabula rasa* learning algorithms at scale.
Topics
- Reinforcement Learning
- JAX
- GPU Acceleration
- Riichi Mahjong
- Imperfect Information Games
- Simulation Environments
- Tabula Rasa Learning
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.