How to test a new AI Model to get up to speed with its capabilities quickly
Summary
The "Model Sense" concept, highlighted by Google's AI product leader and Box's CEO Aaron Levie, emphasizes the critical need for product builders to understand how AI models function to create reliable, AI-powered products. This Knowledge Series guide addresses the disorientation caused by the broad scope of Large Language Models (LLMs) by outlining effective, product-oriented tasks for testing new models. It proposes setting up a dedicated "Model Sense Notebook" to store and monitor model testing work, including practical methods for experimenting with generative AI, coding capabilities, prototyping, UX, data analysis, and API functionalities. The guide also covers evaluations, guardrails, and identifying "Frontier opportunities" that are currently nascent but likely to mature within six months. A downloadable template for this notebook is provided to help track model capabilities and performance over time.
Key takeaway
For AI Product Managers evaluating new AI models, you should prioritize developing "Model Sense" by actively engaging with models through structured testing. Implement a dedicated "Model Sense Notebook" to log experiments, track performance, and identify emerging capabilities, ensuring your product roadmap incorporates reliable AI features and anticipates future advancements.
Key insights
Developing "Model Sense" is crucial for product builders to create reliable AI-powered products.
Principles
- Hands-on testing reveals true model capabilities.
- Track model performance over time.
- Identify future "Frontier opportunities."
Method
Establish a "Model Sense Notebook" to log model tests, prompts, frontier opportunities, changelogs, and evaluation suites. Conduct practical experiments across generative AI, coding, UX, data analysis, and API capabilities.
In practice
- Create a Notion or Google Doc for model testing.
- Use the provided Model Sense Notebook template.
- Test Claude Sonnet 4.5 for HTML calculator generation.
Topics
- Model Sense
- AI Product Development
- Large Language Models
- Model Evaluation
- Generative AI
Best for: AI Product Manager, Product Manager, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Department of Product.