Jensen's Inequality - Why the Average of a Curve is Not the Curve of the Average
Summary
The concept of convexity, defined by a curve consistently bending upward like a bowl, is introduced using the "chord test." A function is convex if any chord drawn between two points on its graph always lies above or on the curve. Examples include $x^2$, $a^x$, and $|x|$. Concave functions, like the logarithm, exhibit the opposite behavior where the chord lies below the curve. This principle extends to Jensen's inequality, which generalizes the chord test for multiple points weighted by probabilities. It states that for a convex function $f$, $f(E[x]) \le E[f(x)]$, meaning the function of the expectation is less than or equal to the expectation of the function. This inequality flips for concave functions. A concrete example demonstrates that for $f(x) = x^2$, the Jensen gap, $E[f(x)] - f(E[x])$, precisely equals the variance of $x$, illustrating how curvature quantifies spread.
Key takeaway
For data scientists and machine learning engineers working with probabilistic models, understanding Jensen's inequality is crucial. It provides a fundamental tool for deriving bounds and inequalities, such as the non-negativity of KL divergence or the validity of the ELBO in variational inference. Your ability to recognize convex or concave functions will directly inform how you interpret and apply expected values in complex systems, ensuring accurate model formulation and analysis.
Key insights
Convexity and Jensen's inequality reveal how function curvature dictates the relationship between function of expectation and expectation of function.
Principles
- A function is convex if its chord always sits above the curve.
- Jensen's inequality generalizes convexity to probabilistic averages.
- Curvature creates asymmetry in expected values.
In practice
- Use Jensen's inequality to prove the non-negativity of KL divergence.
- Apply Jensen's inequality to establish lower bounds in variational inference.
- Understand variance as a direct consequence of $f(x)=x^2$ convexity.
Topics
- Jensen's Inequality
- Convex Functions
- Concave Functions
- Chord Test
- Variance
Best for: AI Scientist, Machine Learning Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by DataMListic.