Loss-Shift Transfer via Bayes Quotients
Summary
A new research paper identifies an orthogonal failure mode called "loss shift," distinct from traditional distribution shift, where the data distribution remains fixed but the loss function changes. This setting implies that different representations may be required even under the same joint law P(X,Y) because a loss determines Bayes-relevant information in X. The concept is formalized using Bayes quotients, which allow losses to be ordered by refinement. A key finding is that a source-minimal representation for a coarser loss is insufficient for a strictly finer target loss. For finite-output log loss, this obstruction quantifies as the excess risk being the conditional information about Y discarded by the representation. Experiments across controlled, learned, synthetic-image, and real-image settings confirm this predicted effect, demonstrating that classification-equivalent representations can exhibit different optimal log-loss performance under a fixed data distribution.
Key takeaway
For AI Scientists optimizing models, if you are changing your loss function while the data distribution remains fixed, recognize that your current representations may become insufficient. This "loss shift" necessitates re-evaluating or re-learning representations to avoid suboptimal performance, even if classification-equivalent. Account for the Bayes-relevant information dictated by the new loss, as a source-minimal representation for a coarser loss will not suffice for a strictly finer target loss.
Key insights
Loss shift, a distinct failure mode from distribution shift, occurs when the loss function changes, necessitating different data representations.
Principles
- Loss functions dictate Bayes-relevant information in X.
- Coarser loss representations fail for finer target losses.
- Excess risk quantifies discarded conditional Y information.
Method
The paper formalizes loss shift using Bayes quotients to order losses by refinement, identifying when source-minimal representations become insufficient for finer target losses.
In practice
- Expect varying log-loss performance from equivalent representations.
- Consider loss function changes even with fixed data.
Topics
- Loss Shift
- Bayes Quotients
- Transfer Learning
- Representation Learning
- Log Loss
- Machine Learning Theory
Best for: Research Scientist, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.