How to test a new AI Model to get up to speed with its capabilities quickly

2025-12-16 · Source: Department of Product · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation, Data Science & Analytics · Depth: Intermediate, quick

Summary

The "Model Sense" concept, highlighted by Google's AI product leader and Box's CEO Aaron Levie, emphasizes the critical need for product builders to understand how AI models function to create reliable, AI-powered products. This Knowledge Series guide addresses the disorientation caused by the broad scope of Large Language Models (LLMs) by outlining effective, product-oriented tasks for testing new models. It proposes setting up a dedicated "Model Sense Notebook" to store and monitor model testing work, including practical methods for experimenting with generative AI, coding capabilities, prototyping, UX, data analysis, and API functionalities. The guide also covers evaluations, guardrails, and identifying "Frontier opportunities" that are currently nascent but likely to mature within six months. A downloadable template for this notebook is provided to help track model capabilities and performance over time.

Key takeaway

For AI Product Managers evaluating new AI models, you should prioritize developing "Model Sense" by actively engaging with models through structured testing. Implement a dedicated "Model Sense Notebook" to log experiments, track performance, and identify emerging capabilities, ensuring your product roadmap incorporates reliable AI features and anticipates future advancements.

Key insights

Developing "Model Sense" is crucial for product builders to create reliable AI-powered products.

Principles

Hands-on testing reveals true model capabilities.
Track model performance over time.
Identify future "Frontier opportunities."

Method

Establish a "Model Sense Notebook" to log model tests, prompts, frontier opportunities, changelogs, and evaluation suites. Conduct practical experiments across generative AI, coding, UX, data analysis, and API capabilities.

In practice

Create a Notion or Google Doc for model testing.
Use the provided Model Sense Notebook template.
Test Claude Sonnet 4.5 for HTML calculator generation.

Topics

Model Sense
AI Product Development
Large Language Models
Model Evaluation
Generative AI

Best for: AI Product Manager, Product Manager, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Department of Product.