Inside soccer’s data renaissance
Summary
Jesse Davis's Sports Analytics Lab at KU Leuven is at the forefront of soccer's data revolution, applying advanced machine learning to uncover tactical insights. A notable finding, presented in a 2024 paper titled "Boot it", demonstrates the strategic value of intentionally kicking the ball out of bounds on the opponent's side from the middle third of the pitch. This seemingly counterintuitive move, simulated using tree ensemble models on a dataset of over 1.4 million passes and 60,000 throw-ins (partly from the 2022 World Cup), can position a team within 10 actions of scoring. The lab's work extends to developing open-source analytics tools like VAEP and xG models, which see thousands of monthly downloads. Furthermore, Davis's team is actively researching methods to standardize in-game data using transformer neural networks, aiming to automate the currently manual and time-consuming process of tactical annotation, which can take up to six hours per game. Their research significantly impacts professional clubs across Europe, including Royal Sporting Club Anderlecht and Club Brugge KV.
Key takeaway
For soccer analysts and data scientists evaluating tactical decisions, consider integrating counter-intuitive strategies like the "Boot It" approach into your models. Your analysis should quantify the trade-off between possession and positional advantage, especially in the defensive third. Implement machine learning frameworks to simulate and compare alternative actions, such as long clearances followed by aggressive pressing. This can reveal hidden value, potentially increasing expected goals per season, and inform more dynamic, data-driven coaching strategies.
Key insights
Machine learning reveals counter-intuitive soccer strategies, like "Boot It," by analyzing complex game data.
Principles
- Data analysis can uncover hidden tactical advantages.
- Sacrificing possession can create positional advantage.
- Standardizing data improves analytical efficiency.
Method
A framework uses an XGBoost model to compare observed backward passes with simulated long clearances, evaluating the probability of scoring minus conceding within 10 actions based on game state features.
In practice
- Use "Boot It" strategy in defensive third to gain advantage.
- Apply pressure after long clearances to maximize benefit.
- Employ open-source VAEP and xG models for game analysis.
Topics
- Sports Analytics
- Machine Learning
- Soccer Tactics
- Data Standardization
- XGBoost
- Open-Source Tools
- Transformer Models
Best for: NLP Engineer, AI Scientist, Data Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by MIT Technology Review.