Humans' ALMANAC: A Human Collaboration Dataset of Action-Level Mental Model Annotations for Agent Collaboration
Summary
Almanac is a new human collaboration dataset designed to guide Large Language Model (LLM) agents toward process-level collaborative competence, moving beyond mere task completion. Built from the classic Map Task, Almanac contains 2,987 collaboration actions from 50 participants across 25 dyadic sessions. Each action is paired with theory-informed mental model annotations, capturing participants' self-reasoning, perceived partner intent, and perceived team goal. Researchers benchmarked six LLMs (Qwen3.6-35B-A3B, Llama 3.3 70B, GPT-5.5, Claude 4.6 Sonnet, Qwen3-4B, Qwen3-30B-A3B) on next-turn behavior and mental model prediction. Results show Almanac's utility in evaluating models' ability to simulate human collaborative behaviors and infer underlying mental models, with fine-tuned smaller models approaching larger proprietary models.
Key takeaway
For AI Scientists and Machine Learning Engineers developing collaborative LLM agents, Almanac offers crucial process-level supervision signals. You should consider fine-tuning models on this dataset to improve their ability to infer human partners' mental states, especially shared goals and partner intent, which are more predictable than private self-reasoning. This will help agents move beyond task-solving to become more genuine collaborative partners.
Key insights
Almanac dataset provides action-level mental model annotations to train LLM agents for effective human-agent collaboration.
Principles
- Effective collaboration requires continuous mental model alignment.
- Observable behavior and mental models offer complementary signals.
Method
A two-step annotation framework combines in-session checkpoints (25%, 50%, 75% progress) with post-session retrospective labeling, using memory anchors to capture action-level mental models.
In practice
- Benchmark LLMs on next-turn behavior prediction.
- Evaluate LLMs on mental model inference capabilities.
Topics
- LLM Agents
- Human-Agent Collaboration
- Mental Models
- Collaboration Datasets
- Map Task
- Behavior Prediction
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.