Mahjax: A GPU-Accelerated Mahjong Simulator for Reinforcement Learning in JAX

2026-05-20 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, medium

Summary

Mahjax is a new GPU-accelerated Riichi Mahjong simulator developed in JAX, designed to facilitate reinforcement learning research, particularly for *tabula rasa* learning approaches. Riichi Mahjong is a multi-player, imperfect-information game with high stochasticity and complex state spaces, making it a challenging environment for AI. Unlike prior methods relying on supervised learning from human play logs, Mahjax supports learning from scratch by enabling large-scale rollout parallelization on GPUs. The simulator demonstrates impressive performance, achieving throughputs of up to 2 million steps per second under no-red rules and 1 million steps per second under red rules, utilizing eight NVIDIA A100 GPUs. It also includes a high-quality visualization tool for debugging. Experimental results confirm its utility, showing that agents can be effectively trained to improve their rank against baseline policies.

Key takeaway

For Machine Learning Engineers developing reinforcement learning agents for complex, imperfect-information games, Mahjax offers a critical tool. You can now accelerate *tabula rasa* learning research using its GPU-accelerated JAX environment, achieving millions of simulation steps per second. This enables faster iteration and more robust agent training, potentially leading to breakthroughs in domains mirroring real-world decision-making challenges. Consider integrating Mahjax for large-scale experimentation and benchmarking.

Key insights

Mahjax provides a high-throughput, GPU-accelerated JAX environment for *tabula rasa* reinforcement learning in complex imperfect-information games.

Principles

Vectorized JAX environments enable massive RL parallelization.
Imperfect-information games challenge *tabula rasa* learning.
High-throughput simulation accelerates algorithm development.

Method

The article describes the implementation of Mahjax in JAX for vectorized Riichi Mahjong simulation, enabling GPU-accelerated parallel rollouts and agent training against baselines.

In practice

Train RL agents for complex imperfect-information games.
Debug agent behavior using the visualization tool.
Benchmark *tabula rasa* learning algorithms at scale.

Topics

Reinforcement Learning
JAX
GPU Acceleration
Riichi Mahjong
Imperfect Information Games
Simulation Environments
Tabula Rasa Learning

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.