Humans' ALMANAC: A Human Collaboration Dataset of Action-Level Mental Model Annotations for Agent Collaboration

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

Almanac is a new human collaboration dataset designed to guide Large Language Model (LLM) agents toward process-level collaborative competence, moving beyond mere task completion. Built from the classic Map Task, Almanac contains 2,987 collaboration actions from 50 participants across 25 dyadic sessions. Each action is paired with theory-informed mental model annotations, capturing participants' self-reasoning, perceived partner intent, and perceived team goal. Researchers benchmarked six LLMs (Qwen3.6-35B-A3B, Llama 3.3 70B, GPT-5.5, Claude 4.6 Sonnet, Qwen3-4B, Qwen3-30B-A3B) on next-turn behavior and mental model prediction. Results show Almanac's utility in evaluating models' ability to simulate human collaborative behaviors and infer underlying mental models, with fine-tuned smaller models approaching larger proprietary models.

Key takeaway

For AI Scientists and Machine Learning Engineers developing collaborative LLM agents, Almanac offers crucial process-level supervision signals. You should consider fine-tuning models on this dataset to improve their ability to infer human partners' mental states, especially shared goals and partner intent, which are more predictable than private self-reasoning. This will help agents move beyond task-solving to become more genuine collaborative partners.

Key insights

Almanac dataset provides action-level mental model annotations to train LLM agents for effective human-agent collaboration.

Principles

Method

A two-step annotation framework combines in-session checkpoints (25%, 50%, 75% progress) with post-session retrospective labeling, using memory anchors to capture action-level mental models.

In practice

Topics

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.