Temporal Stability and Few-Shot Prompting in Math Task Assessment

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, AI in Education · Depth: Intermediate, quick

Summary

A longitudinal study investigated the temporal stability and few-shot prompting efficacy of AI tools in classifying the cognitive demand of mathematics tasks using the Task Analysis Guide (TAG). Researchers tested a general-purpose tool, Gemini, and an education-specific tool, Coteach. Results showed that model version updates alone had mixed effects: Gemini's accuracy remained stable at 58%, while Coteach's decreased from 75% to 50%. However, few-shot prompting, using two exemplar tasks per category, significantly improved both models' performance, raising Gemini to 67% and Coteach to 75% accuracy. These findings suggest prompt engineering offers more reliable improvements than passive model updates, and version updates may not consistently enhance specialized educational task performance.

Key takeaway

For educators and researchers selecting or implementing AI tools for specialized educational tasks, you should prioritize robust prompt engineering strategies over relying solely on model version updates. Actively test and re-evaluate AI tool performance after any update, as passive improvements are not guaranteed and can even degrade accuracy. Your investment in crafting effective prompts will yield more consistent and significant performance gains.

Key insights

Prompt engineering offers more reliable AI performance improvements on specialized tasks than passive model version updates.

Principles

Method

The study tested Gemini and Coteach at baseline, after model version updates, and then with few-shot prompting (two exemplars per cognitive demand category) for math task classification.

In practice

Topics

Best for: AI Engineer, Machine Learning Engineer, NLP Engineer, AI Scientist, Research Scientist, Consultant

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.