I Asked ChatGPT, Claude and DeepSeek to Build Tetris

2025-12-22 · Source: KDnuggets · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, medium

Summary

An editorial analyst conducted a practical test comparing the code generation capabilities of three flagship large language models: Claude Opus 4.5, GPT-5.2 Pro, and DeepSeek V3.2. The objective was to evaluate their ability to build a fully functional Tetris game from a single, detailed prompt, focusing on first-attempt success, feature completeness, playability, and cost-effectiveness. Claude Opus 4.5 delivered a complete, smooth, and playable game in under 2 minutes. GPT-5.2 Pro, despite being OpenAI's most intelligent model and 4x more expensive than Opus 4.5, produced a game with a layout bug on its first attempt, requiring a follow-up prompt to fix, and still resulted in a less smooth user experience. DeepSeek V3.2, the most affordable option, had multiple bugs, including disappearing pieces and scrolling issues, making the game unplayable even after a second iteration. The cost analysis showed Opus 4.5 at ~$0.09 for a playable game, GPT-5.2 Pro at ~$0.41 for a playable but poor UX game, and DeepSeek V3.2 at ~$0.005 for an unplayable game.

Key takeaway

For AI engineers and software developers evaluating LLMs for code generation, prioritize Claude Opus 4.5 for day-to-day coding tasks due to its high first-attempt success and superior output quality, which ultimately saves time and cost. If your project has a tight budget and you have debugging capacity, DeepSeek V3.2 offers a cost-effective alternative, even with multiple iterations. Avoid GPT-5.2 Pro for simple coding, as its strengths lie in complex reasoning, making it over-engineered and less efficient for straightforward development.

Key insights

Model performance for coding tasks varies significantly across LLMs, impacting development cost and user experience.

Principles

Higher cost does not guarantee superior code generation.
First-attempt success reduces overall development cost.
Model suitability depends on task complexity and budget.

Method

Evaluate LLMs for code generation by prompting them to build a complex, interactive application (e.g., Tetris) and assessing first-attempt success, feature completeness, playability, and total cost across iterations.

In practice

Use Claude Opus 4.5 for reliable daily coding tasks.
Consider DeepSeek V3.2 for budget-constrained projects.
Reserve GPT-5.2 Pro for complex reasoning tasks.

Topics

AI Model Comparison
Code Generation
Large Language Model Performance
Cost-effectiveness
Game Development

Best for: AI Engineer, Machine Learning Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by KDnuggets.