Learning to Dequantise with Truncated Flows
Summary
The article, published on April 8, 2022, introduces Truncated Flows (TRUFL) as a method for learning to dequantize in autoregressive language modeling, addressing the posterior collapse problem encountered with stochastic embeddings. TRUFL is presented as a trade-off between Categorical Normalizing Flows (CatNF) and Argmax Flow, aiming to enable deterministic decoding while allowing the model to learn the support for each category. Unlike Argmax Flow, which uses pre-specified binary encoding for partitioning, TRUFL learns the support dynamically. The method leverages the reparameterization trick with truncated uniform sampling and inverse CDFs, specifically using the Logistic distribution for its tractable cumulative and inverse cumulative functions. Experimental animations demonstrate TRUFL's ability to learn optimal truncations, leading to nearly deterministic decoding and better KL term minimization compared to CatNF under high reconstruction loss weighting.
Key takeaway
For research scientists developing autoregressive language models with stochastic embeddings, consider integrating Truncated Flows (TRUFL) to mitigate posterior collapse. This approach allows for learning category support and achieving nearly deterministic decoding, potentially improving model efficiency and interpretability. Your team should evaluate TRUFL as an alternative to CatNF or Argmax Flow, especially when balancing reconstruction accuracy and KL divergence.
Key insights
TRUFL enables learning dequantization with truncated flows, addressing posterior collapse in stochastic embeddings for language models.
Principles
- Optimal decoding encourages distinct categories.
- Tractable inverse CDFs allow distribution transformation.
Method
TRUFL uses truncated uniform sampling on an interval $(a(x), b(x))$ within $(0,1)$, transforming it via a differentiable inverse CDF (e.g., Logistic distribution's logit function) to parameterize distributions with learnable, limited support.
In practice
- Use Logistic distribution for tractable CDF/inverse CDF.
- Add two parameters for truncation boundaries.
Topics
- Truncated Flows
- Normalizing Flows
- Dequantisation
- Stochastic Embeddings
- Posterior Collapse
Best for: Research Scientist, AI Researcher, AI Scientist, Deep Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by when trees fall....