How fast is 10 tokens per second really?

· Source: Simon Willison's Weblog · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Novice, quick

Summary

Mike Veerman has developed a straightforward HTML application designed to simulate Large Language Model (LLM) token output speeds, providing a tangible visualization of performance metrics. This tool allows users to experience speeds ranging from a slow 5 tokens per second up to a very rapid 800 tokens per second. The application is particularly useful for anyone encountering advertised model speeds, such as "30 tokens/second," and wishing to gain a practical understanding of the actual user experience. By converting abstract numerical claims into a real-time simulation, it helps users intuitively grasp the implications of varying LLM inference rates. The application's source code is publicly available on GitHub, and it was posted on May 20th, 2026.

Key takeaway

For AI Product Managers evaluating LLM performance claims, you should use Mike Veerman's token speed simulator to visualize advertised speeds. This tool helps you understand the real-world user experience of different token generation rates, moving beyond abstract numbers. It allows you to intuitively grasp how "30 tokens/second" translates into practical interaction speed, informing your product design and communication strategies.

Key insights

The HTML app visualizes LLM token output speeds from 5 to 800 tokens/second for practical understanding.

In practice

Topics

Code references

Best for: AI Engineer, Machine Learning Engineer, AI Product Manager

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Simon Willison's Weblog.