CFRNet: Cycle-Consistent Fixed-Point Training for Real-Time Blind Face Restoration on Consumer Embedded NPUs
Summary
CFRNet is a 2.0 M-parameter ResNet-style neural network designed for real-time blind face restoration on consumer embedded NPUs at a $256\times 256$ resolution. It introduces Cycle-Consistent Fixed-Point Training (CCFP), which trains the network to act as a fixed-point operator, ensuring that repeated application does not alter an already restored face. CCFP incorporates progressive multi-cycle supervision, an idempotence loss, and a re-degradation cycle loss, adding no inference cost. On a 300-image test set, CFRNet achieves the best perceptual score (LPIPS 0.250 at three cycles, 31% lower than one cycle) and the best PSNR and SSIM at two cycles among deploy-compatible baselines. It runs in approximately 23 ms per cycle in INT8 on a HiSilicon Hi3402 NPU, where other "Lite" baselines fail to compile. The cycle count $k$ serves as a quality knob, with PSNR peaking at $k=2$ and LPIPS improving up to $k=3$.
Key takeaway
For Machine Learning Engineers deploying face restoration models on embedded NPUs, CFRNet offers a robust solution. You should consider its Cycle-Consistent Fixed-Point Training (CCFP) approach to achieve high perceptual quality (LPIPS 0.250 at $k=3$) and NPU compatibility. You can dynamically adjust the cycle count $k$ at inference to balance perceptual quality ($k=3$) with pixel fidelity ($k=2$) without retraining, optimizing for your specific application's latency and quality needs.
Key insights
Training a face restorer as a fixed-point operator ensures stable, iterative refinement on resource-constrained NPUs.
Principles
- Iterative inference benefits from fixed-point training.
- Perception-distortion trade-off exists within one model.
- Local component supervision improves artifact reduction.
Method
Cycle-Consistent Fixed-Point Training (CCFP) uses progressive multi-cycle supervision, an idempotence loss, and a re-degradation cycle loss to train a generator as a fixed-point operator.
In practice
- Use $k=3$ for best perceptual quality (LPIPS).
- Use $k=2$ for best pixel fidelity (PSNR/SSIM).
- Add nose region supervision for mid-face artifacts.
Topics
- Blind Face Restoration
- Neural Processing Units
- Cycle-Consistent Fixed-Point Training
- Real-time Inference
- INT8 Quantization
- Embedded AI
Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Hardware Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.