Inside soccer’s data renaissance

· Source: MIT Technology Review · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Sports Analytics · Depth: Advanced, extended

Summary

Jesse Davis's Sports Analytics Lab at KU Leuven is at the forefront of soccer's data revolution, applying advanced machine learning to uncover tactical insights. A notable finding, presented in a 2024 paper titled "Boot it", demonstrates the strategic value of intentionally kicking the ball out of bounds on the opponent's side from the middle third of the pitch. This seemingly counterintuitive move, simulated using tree ensemble models on a dataset of over 1.4 million passes and 60,000 throw-ins (partly from the 2022 World Cup), can position a team within 10 actions of scoring. The lab's work extends to developing open-source analytics tools like VAEP and xG models, which see thousands of monthly downloads. Furthermore, Davis's team is actively researching methods to standardize in-game data using transformer neural networks, aiming to automate the currently manual and time-consuming process of tactical annotation, which can take up to six hours per game. Their research significantly impacts professional clubs across Europe, including Royal Sporting Club Anderlecht and Club Brugge KV.

Key takeaway

For soccer analysts and data scientists evaluating tactical decisions, consider integrating counter-intuitive strategies like the "Boot It" approach into your models. Your analysis should quantify the trade-off between possession and positional advantage, especially in the defensive third. Implement machine learning frameworks to simulate and compare alternative actions, such as long clearances followed by aggressive pressing. This can reveal hidden value, potentially increasing expected goals per season, and inform more dynamic, data-driven coaching strategies.

Key insights

Machine learning reveals counter-intuitive soccer strategies, like "Boot It," by analyzing complex game data.

Principles

Method

A framework uses an XGBoost model to compare observed backward passes with simulated long clearances, evaluating the probability of scoring minus conceding within 10 actions based on game state features.

In practice

Topics

Best for: NLP Engineer, AI Scientist, Data Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by MIT Technology Review.