When are likely answers right? On Sequence Probability and Correctness in LLMs

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new study quantifies the relationship between sequence probability and correctness in large language models (LLMs) across various decoding methods, hyperparameters, and prompt-answer pairs. Researchers found that higher sequence probability often predicts correctness when comparing different prompt-answer pairs within a fixed dataset. However, this correlation does not reliably extend to decoding decisions; increasing sequence probability by altering hyperparameters or methods does not consistently improve accuracy. Furthermore, sequence probability proves to be an unreliable indicator of correctness for repeated responses generated from the same prompt. These findings offer crucial clarity on when decoding strategies can genuinely enhance LLM correctness and provide practical guidance for self-consistency and verifier-free self-improvement techniques.

Key takeaway

For ML Engineers and AI Scientists optimizing LLM outputs, you should reconsider relying solely on increasing sequence probability through decoding methods or hyperparameters to boost accuracy. While sequence probability can indicate correctness within a dataset, it does not reliably transfer to improving model performance via decoding changes. Focus your efforts on strategies that address the fundamental alignment of model likelihood with truth, rather than assuming higher probability always means better answers, especially for repeated generations.

Key insights

Higher sequence probability often predicts correctness within datasets, but not across decoding decisions or repeated responses.

Principles

In practice

Topics

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.