TritonSigmoid: A fast, padding-aware sigmoid attention kernel for GPUs [R]
Summary
TritonSigmoid is an open-source, fast, and padding-aware sigmoid attention kernel designed for GPUs, specifically developed for single-cell foundation models. Unlike softmax, which forces competition among tokens, sigmoid attention allows models to attend strongly to multiple genes (tokens) simultaneously, crucial for sequences where cells express 200 to 16,000+ genes. The kernel natively handles variable-length padding, avoiding wasted compute on empty positions. Experiments show TritonSigmoid achieves up to 515 TFLOPS on H100 GPUs, outperforming FlashAttention-2 (361 TFLOPS) and FlashSigmoid (440 TFLOPS). It also demonstrated lower validation loss across six datasets, 25% better cell-type separation, and stable training where softmax attention diverged catastrophically.
Key takeaway
For AI Engineers developing models with variable sequence lengths, especially in genomics or other domains where multiple features can be simultaneously relevant, TritonSigmoid offers significant performance and stability advantages over traditional softmax attention. Its native padding awareness and `torch.compile` integration streamline development and improve training outcomes, even with a potential memory overhead compared to packed approaches. Consider integrating this kernel to enhance model accuracy and training robustness.
Key insights
TritonSigmoid offers superior performance and stability for variable-length sequence attention, especially in biological modeling.
Principles
- Sigmoid attention enables multi-gene focus.
- Padding-aware kernels optimize variable-length sequences.
Method
The kernel uses blockwise compute, similar to FlashAttention, and handles variable lengths by padding to max length and skipping fully padded blocks to maximize `torch.compile` integration.
In practice
- Use TritonSigmoid for single-cell genomics.
- Apply to any variable sequence length data.
Topics
- TritonSigmoid
- Sigmoid Attention
- GPU Kernel
- Padding-Aware Attention
- Single-Cell Foundation Models
Code references
Best for: AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.