Otters++: A Time-to-first-spike Based Energy Efficient Optical Spiking Transformer
Summary
Otters++ is an energy-efficient optical spiking Transformer that addresses the practical overheads of time-to-first-spike (TTFS) coding in spiking neural networks (SNNs). It repurposes the natural signal decay of a custom In2O3 optoelectronic synapse to directly compute the TTFS temporal term, eliminating the need for explicit digital decay calculations. To enable training for Transformer models, Otters++ establishes a layer-wise functional equivalence with a quantized neural network (QNN) and employs a hybrid SNN-forward/QNN-backward training method, incorporating knowledge distillation and noise-aware forward sampling to handle device variations. Evaluated on the GLUE dataset, Otters++ achieves an average score of 84.17% and demonstrates significant energy reductions, consuming only 14.2 mJ per layer, which is 1.84x to 5.68x more efficient than prior spiking Transformer baselines.
Key takeaway
For AI Hardware Engineers designing neuromorphic accelerators for large language models, Otters++ demonstrates a compelling path to ultra-low power inference. You should investigate integrating optoelectronic synapses to offload temporal decay computations from digital logic, as this approach significantly reduces arithmetic, memory access, and communication energy costs, achieving 1.84x-5.68x energy savings over existing SNN Transformers while maintaining competitive accuracy.
Key insights
Otters++ leverages optoelectronic device decay for energy-efficient time-to-first-spike computation in spiking Transformers.
Principles
- Repurpose hardware "bugs" as computational primitives.
- Establish functional equivalence for stable SNN training.
- Incorporate device noise into training for robustness.
Method
A hybrid SNN-forward/QNN-backward training framework uses device-faithful SNN computation in the forward pass and QNN straight-through gradients in the backward pass, augmented with model distillation and noise-aware sampling.
In practice
- Use In2O3 TFTs for analog temporal decay.
- Quantize Key/Value projections to 1-bit.
- Employ system-level energy modeling for realistic assessment.
Topics
- Spiking Neural Networks
- Optical Computing
- Time-to-First-Spike
- Transformer Models
- Energy Efficiency
- Neuromorphic Hardware
- In2O3 Optoelectronic Synapse
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Hardware Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.