Calibrating Generative Models to Feature Distributions with MMD Finetuning
Summary
Kernel Calibrating Generative Models (kCGM) is a novel method designed to correct distributional miscalibration in generative models, where samples are plausible but deviate from a target set's feature distribution. Unlike direct finetuning, which can overfit and lacks control over feature matching, kCGM minimizes a maximum mean discrepancy (MMD) between generated and target feature distributions. It achieves this using an unbiased score-function estimator, complemented by KL regularization to maintain proximity to the pretrained model. The method was successfully applied to a target set of 174 antibiotics, where kCGM improved feature matching while simultaneously increasing chemical validity, a significant advantage over direct finetuning. Furthermore, kCGM's versatility was demonstrated in protein and DNA generation tasks, showing its capability to adapt autoregressive, continuous-space diffusion, and discrete diffusion models using only feature-level supervision.
Key takeaway
For AI Scientists developing generative models for specific target distributions, kCGM offers a robust alternative to direct finetuning. You should consider implementing kCGM to improve feature matching while preserving sample validity, especially in domains like drug discovery or protein design. This approach mitigates overfitting risks inherent in simpler finetuning methods and allows adaptation of diverse model architectures using only feature-level supervision.
Key insights
kCGM corrects generative model miscalibration by matching feature distributions with MMD and KL regularization, improving validity.
Principles
- Generative models can misalign target feature distributions.
- Direct finetuning risks overfitting and validity loss.
- MMD minimization aligns generated and target features.
Method
kCGM minimizes MMD between generated and target feature distributions using an unbiased score-function estimator, with KL regularization to stay near the pretrained model.
In practice
- Apply kCGM for targeted drug molecule generation.
- Calibrate autoregressive and diffusion models.
Topics
- Generative Models
- MMD Finetuning
- Distributional Calibration
- Drug Discovery
- Diffusion Models
- Protein Generation
Code references
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.