[R] Analysis of 350+ ML competitions in 2025

2026-02-19 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Emerging Technologies & Innovation · Depth: Intermediate, short

Summary

An analysis of over 350 machine learning competitions from 2025, compiled by mlcontests.com, reveals key trends in winning solutions across various data types. While gradient-boosted decision trees (GBDTs) like XGBoost and LightGBM still dominate tabular data, AutoML packages (AutoGluon) and tabular foundation models (TabPFN, TabM) are emerging. Compute budgets are increasing, with one team using 512 H100s for 48 hours (estimated $60k cloud cost), though free compute options remain viable. Qwen2.5 and Qwen3 models were prevalent in language/reasoning tasks, largely replacing BERT-style models. Transformer-based models surpassed CNNs in vision competitions for the first time, and OpenAI's Whisper was frequently fine-tuned for audio speech tasks. PyTorch was used in 98% of deep learning solutions, with 20% also using PyTorch Lightning. Polars and JAX saw minimal adoption among winners.

Key takeaway

For AI Engineers developing competitive ML solutions, you should prioritize PyTorch for deep learning and explore Qwen models for language tasks. While GBDTs are still strong for tabular data, investigate AutoGluon or TabPFN for potential advantages. Be prepared for increasing compute demands, but also note that efficient inference tools like vLLM and Unsloth are crucial for optimizing resource usage.

Key insights

ML competition trends show shifts towards foundation models and increased compute, while GBDTs and PyTorch maintain strong positions.

Principles

GBDTs remain strong for tabular data.
PyTorch is the dominant deep learning framework.
Efficiency tools like vLLM and Unsloth are key.

Method

Winning solutions for language tasks often fine-tune Qwen models, while audio speech tasks commonly fine-tune OpenAI's Whisper. Vision tasks increasingly favor Transformer-based models over CNNs.

In practice

Consider AutoGluon or TabPFN for tabular data.
Utilize Qwen models for text-related competitions.
Employ vLLM or Unsloth for efficient inference/fine-tuning.

Topics

Machine Learning Competitions
Tabular Models
Compute Resources
Language Models
Transformer Models

Best for: AI Engineer, NLP Engineer, Computer Vision Engineer, Machine Learning Engineer, Data Scientist, AI Researcher

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.