How Amazon Uses LLMs to Recommend Products
Summary
Amazon developed COSMO, a large-scale e-commerce commonsense knowledge graph and serving system, to bridge the "semantic gap" in product search and recommendations. Traditional systems struggle with implicit user intent, such as understanding that "pregnant women" need "slip-resistant shoes." COSMO addresses this by extracting commonsense reasoning from user behavior data using large language models (LLMs) like OPT-175B and OPT-30B. The system processes millions of query-purchase and co-purchase pairs, generating candidate explanations that are then refined through a multi-stage pipeline involving rule-based filtering, similarity filtering, human-in-the-loop annotation, and classifier generalization. This process resulted in a knowledge graph of 6.3 million nodes and 29 million edges across 18 product categories. For real-time inference, Amazon instruction-tuned smaller LLaMA 7B and 13B models into COSMO-LM, which can generate and evaluate commonsense knowledge on the fly. Deployed with a Feature Store and Asynchronous Cache Store, COSMO significantly improved search relevance (e.g., 60% Macro F1 gain on ESCI dataset with frozen encoders), session-based recommendations, and search navigation, leading to a 0.7% relative increase in product sales and an 8% increase in navigation engagement in A/B tests, projecting billions in potential annual revenue.
Key takeaway
For AI Engineers building recommendation or search systems, understanding Amazon's COSMO architecture is crucial. Your team should consider implementing a multi-stage filtering pipeline for LLM-generated knowledge and instruction-tuning smaller models for production to achieve both scale and accuracy. This approach can significantly enhance user intent understanding and drive substantial business impact, as demonstrated by Amazon's hundreds of millions in additional annual revenue.
Key insights
Amazon's COSMO system uses LLMs and a multi-stage pipeline to build and serve commonsense knowledge for e-commerce.
Principles
- Bridge semantic gaps with commonsense knowledge.
- Filter LLM outputs rigorously for quality.
- Instruction-tune smaller models for production inference.
Method
Amazon's COSMO system generates commonsense knowledge by feeding user behavior data into LLMs, filtering outputs, human-annotating, and training classifiers. It then instruction-tunes smaller LLMs for efficient, real-time inference.
In practice
- Use query specificity to prioritize ambiguous queries.
- Decompose human annotation tasks into yes/no questions.
- Employ multi-task training for smaller LLMs.
Topics
- COSMO Knowledge Graph
- Large Language Models
- E-commerce Recommendations
- Semantic Gap
- Instruction Tuning
Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by ByteByteGo Newsletter.