How fast is 10 tokens per second really?
Summary
Mike Veerman has developed a straightforward HTML application designed to simulate Large Language Model (LLM) token output speeds, providing a tangible visualization of performance metrics. This tool allows users to experience speeds ranging from a slow 5 tokens per second up to a very rapid 800 tokens per second. The application is particularly useful for anyone encountering advertised model speeds, such as "30 tokens/second," and wishing to gain a practical understanding of the actual user experience. By converting abstract numerical claims into a real-time simulation, it helps users intuitively grasp the implications of varying LLM inference rates. The application's source code is publicly available on GitHub, and it was posted on May 20th, 2026.
Key takeaway
For AI Product Managers evaluating LLM performance claims, you should use Mike Veerman's token speed simulator to visualize advertised speeds. This tool helps you understand the real-world user experience of different token generation rates, moving beyond abstract numbers. It allows you to intuitively grasp how "30 tokens/second" translates into practical interaction speed, informing your product design and communication strategies.
Key insights
The HTML app visualizes LLM token output speeds from 5 to 800 tokens/second for practical understanding.
In practice
- Visualize advertised LLM speeds.
- Understand user experience of token generation.
- Compare different LLM inference rates.
Topics
- LLM Performance
- Token Generation Speed
- User Experience Simulation
- Mike Veerman
- Web Applications
- Inference Speed
Code references
Best for: AI Engineer, Machine Learning Engineer, AI Product Manager
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Simon Willison's Weblog.