EgoCS-400K: An Egocentric Gameplay Dataset for World Models

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision & Pattern Recognition, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

EgoCS-400K is a new large-scale egocentric Counter-Strike dataset designed for interactive world modeling, addressing the data gap between passive web videos, limited robotic datasets, and human-driven simulators. Built from public professional CS and CS2 match demos, it provides over 400,000 first-person videos and 10,000 hours of gameplay from more than 1,000 matches and 40,000 rounds, spanning 13 maps and 10 player viewpoints per round. The dataset extracts and temporally aligns player states, view directions, movements, keyboard/button inputs, view-angle changes, weapon usage, game events, and round-level context, enabling rendering of clean first-person videos. EgoCS-400K supports interactive visual modeling tasks such as action-conditioned future prediction, state- and event-aware scene rollout, replay-grounded captioning, and agent egocentric action understanding, serving as a practical bridge for embodied AI research.

Key takeaway

For machine learning engineers developing interactive world models, EgoCS-400K offers a critical resource to overcome data scarcity. You can utilize its 400,000 videos and 10,000 hours of human gameplay. This enables training models for action-conditioned future prediction and agent egocentric action understanding. The dataset provides necessary temporally aligned video-action-language trajectories, bridging the gap between simulated and real-world embodied data challenges. Consider integrating EgoCS-400K to accelerate your research in embodied AI.

Key insights

EgoCS-400K provides a large-scale, replay-grounded egocentric dataset from Counter-Strike gameplay, bridging data gaps for interactive world models.

Principles

Method

EgoCS-400K extracts player states, inputs, view changes, weapon usage, and game events from public CS/CS2 match demos, then renders clean first-person videos.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.