SecureRouter: Encrypted Routing for Efficient Secure Inference
Summary
SecureRouter is an end-to-end encrypted routing and inference framework designed to accelerate secure Transformer inference by enabling input-adaptive model selection under encryption. It addresses the limitations of prior privacy-preserving inference systems, which use a single, fixed transformer model for all encrypted inputs, leading to high latency and cost. SecureRouter integrates a secure router with an MPC-optimized model pool, allowing coordinated routing, inference, and protocol execution while maintaining full data and model confidentiality. The framework includes an MPC-cost-aware secure router trained to predict per-model utility and cost from encrypted features, and an MPC-optimized model pool co-trained for minimal MPC communication and computation overhead. Experiments on GLUE benchmarks show SecureRouter achieves a latency reduction of up to 1.95x with negligible accuracy loss compared to fixed-model MPC baselines, and nearly 50% lower average latency than the SecFormer framework.
Key takeaway
For AI Architects and Research Scientists deploying privacy-preserving Transformer models, SecureRouter offers a practical solution to significantly reduce inference latency and cost. By dynamically routing encrypted inputs to an MPC-optimized model pool, your systems can achieve up to 1.95x speed-up without compromising accuracy. Consider integrating this input-adaptive approach to overcome the computational bottlenecks of traditional MPC-based inference, especially in latency-critical applications like medicine and finance.
Key insights
SecureRouter accelerates encrypted Transformer inference via input-adaptive model selection and an MPC-optimized model pool.
Principles
- Input-adaptive model selection improves efficiency.
- MPC cost is dominated by non-linear operations.
- Co-training router and model pool is crucial.
Method
SecureRouter employs an offline training phase to optimize an MPC-cost-aware router and an MPC-optimized model pool, followed by an online inference phase where the router dynamically selects models from encrypted features using a secure argmax protocol and oblivious transfer.
In practice
- Use Crypten for semi-honest, secret-sharing ML.
- Optimize model architectures for MPC communication.
- Employ Gumbel-Softmax for differentiable routing.
Topics
- Secure Multi-Party Computation
- Encrypted Routing
- Transformer Inference
- MPC-Optimized Model Pool
- Input-Adaptive Model Selection
Code references
Best for: Research Scientist, AI Architect, AI Engineer, AI Scientist, Machine Learning Engineer, AI Security Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.