ScaleToT: Generalizing Structured LLM Reasoning for Billion-Scale Low-Activity User Modeling

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

ScaleToT is a novel framework designed to generalize structured Large Language Model (LLM) reasoning for billion-scale low-activity user modeling, addressing the challenge of sparse interaction histories. It learns complex user-state inference from a small, LLM-processed data subset and extends this understanding to a wider user population. The system enhances reasoning reliability by constructing typed user-state chains through a bounded entropy-guided Tree-of-Thought (ToT) refinement procedure. These teacher-curated chains then train a student model on static profiles using supervised fine-tuning (SFT) and Outcome-Driven Segment-Aware Implicit Reward Policy Optimization (OSIPO). Finally, the student's reasoning representations are transferred to a lightweight profile encoder, enabling shared reasoning signals for remaining users without direct LLM inference. In a billion-scale advertising deployment, ScaleToT increased LT30 by 6.738% in a randomized online A/B test, while its offline reasoning covered only 7.32% of the potential population, drastically cutting compute costs compared to full-population LLM reasoning.

Key takeaway

For Machine Learning Engineers building user models for billion-scale low-activity populations, ScaleToT offers a viable path to apply LLM reasoning without prohibitive costs. If you face challenges with sparse user data or expensive LLM inference, consider adopting a teacher-student distillation approach. This method allows you to achieve significant performance gains, like a 6.738% LT30 increase, while drastically reducing compute requirements by avoiding full-population LLM deployment.

Key insights

ScaleToT generalizes LLM reasoning for sparse user data by training a student model from LLM-generated chains, then transferring it to a lightweight encoder.

Principles

Method

ScaleToT constructs typed user-state chains with entropy-guided ToT, trains a student model via SFT and OSIPO on static profiles, then transfers reasoning to a lightweight profile encoder.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.