ScaleToT: Generalizing Structured LLM Reasoning for Billion-Scale Low-Activity User Modeling
Summary
ScaleToT is a novel framework designed to generalize structured Large Language Model (LLM) reasoning for billion-scale low-activity user modeling, addressing the challenge of sparse interaction histories. It learns complex user-state inference from a small, LLM-processed data subset and extends this understanding to a wider user population. The system enhances reasoning reliability by constructing typed user-state chains through a bounded entropy-guided Tree-of-Thought (ToT) refinement procedure. These teacher-curated chains then train a student model on static profiles using supervised fine-tuning (SFT) and Outcome-Driven Segment-Aware Implicit Reward Policy Optimization (OSIPO). Finally, the student's reasoning representations are transferred to a lightweight profile encoder, enabling shared reasoning signals for remaining users without direct LLM inference. In a billion-scale advertising deployment, ScaleToT increased LT30 by 6.738% in a randomized online A/B test, while its offline reasoning covered only 7.32% of the potential population, drastically cutting compute costs compared to full-population LLM reasoning.
Key takeaway
For Machine Learning Engineers building user models for billion-scale low-activity populations, ScaleToT offers a viable path to apply LLM reasoning without prohibitive costs. If you face challenges with sparse user data or expensive LLM inference, consider adopting a teacher-student distillation approach. This method allows you to achieve significant performance gains, like a 6.738% LT30 increase, while drastically reducing compute requirements by avoiding full-population LLM deployment.
Key insights
ScaleToT generalizes LLM reasoning for sparse user data by training a student model from LLM-generated chains, then transferring it to a lightweight encoder.
Principles
- Structured LLM reasoning can be distilled.
- Sparse data can be augmented via teacher models.
- Costly LLM inference can be avoided at scale.
Method
ScaleToT constructs typed user-state chains with entropy-guided ToT, trains a student model via SFT and OSIPO on static profiles, then transfers reasoning to a lightweight profile encoder.
In practice
- Improve LTV prediction for low-activity users.
- Reduce LLM inference costs in advertising.
- Model user states from static profiles.
Topics
- ScaleToT
- User Modeling
- Large Language Models
- Tree-of-Thought
- Model Distillation
- Lifetime Value Prediction
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.