Live Q&A With the Hosts: Measuring Progress Toward AGI - Cognitive Abilities Hackathon

2026-04-01 · Source: Kaggle · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Research Methodology & Innovation · Depth: Advanced, extended

Summary

A live stream event for the "Measuring Progress Toward AGI" hackathon, co-hosted by Nick from Kaggle, and Ryan Bernell and Oren Kelly from Google DeepMind, provided key updates and deep dives. Nick announced new Kaggle Benchmarks features, including OpenAI model availability (GPT-5.4 and OSS models), an improved model selection panel, automated notifications, rich metadata support in the SDK, and enhanced navigation. Cross-user runs and task versioning were also introduced. Ryan then detailed the hackathon's focus on five cognitive faculties from their paper: Learning, Metacognition, Attention, Executive Functions, and Social Cognition, emphasizing the need for robust, targeted evaluations. The event also covered submission guidelines, including the April 16th deadline, a six-week human judging period, and the requirement for submissions to focus on a single faculty track, with all benchmarks becoming public under Apache 2.0 license after the deadline.

Key takeaway

For AI Scientists and Machine Learning Engineers developing AGI evaluations, focus your hackathon submissions on isolating specific cognitive faculties like Metacognition or Executive Functions, rather than general problem-solving. Ensure your benchmarks include robustness checks and consider varying task difficulty to reveal nuanced model capabilities, as this directly impacts your submission's discriminatory power and overall score.

Key insights

Robust AGI evaluation requires targeted benchmarks isolating specific cognitive faculties, moving beyond general problem-solving.

Principles

Benchmarks must isolate specific cognitive abilities.
Vary task difficulty to test model limits.
Robust signal is more critical than dataset size.

Method

Design benchmarks to avoid models simply looking up answers; instead, force manipulation of information or focus on specific, non-obvious aspects. Combine manual and synthetic data generation for quality and variety.

In practice

Use OpenAI models now available in Kaggle Benchmarks.
Include rich metadata in your SDK for deeper analysis.
Test-retest reliability checks for benchmark robustness.

Topics

AGI Cognitive Framework
Kaggle Benchmarks
Cognitive Abilities Hackathon
Large Language Model Evaluation
Metacognition Benchmarking

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Kaggle.