ASTRA: A Scalable Next-Generation ATCO Training Simulator with Autonomous Simpilots
Summary
ASTRA is an end-to-end training simulator designed to automate simpilot roles for Air Traffic Control Operator (ATCO) training, addressing capacity limitations and localization challenges in Singaporean operational contexts. It features a speech-to-speech pipeline that transcribes ATCO trainee speech, interprets instructions, and generates appropriate pilot and ATCO responses using locally adapted voice models. This fine-tuned Automatic Speech Recognition (ASR) pipeline significantly reduces Word Error Rate (WER) to 23.45% on Singaporean-accented aviation speech, a substantial improvement over existing off-the-shelf systems' 107.80%. ASTRA also integrates an AI-assisted performance evaluation framework, achieving post-optimization scores of 91.7% for accuracy, 88.2% for brevity, and 86.9% for completeness in assessing trainee radiotelephony communications. Built on open-source tools like DSPy and Unsloth, the system enables scalable, standardized ATCO assessment while reducing instructor workload.
Key takeaway
For MLOps Engineers deploying speech-based AI in specialized, accented operational environments, you should prioritize extensive fine-tuning of ASR and TTS models with local accent and domain-specific data. Relying on frontier models alone will lead to unacceptable error rates. Implement a hybrid evaluation framework combining deterministic rules with LLM-based analysis to ensure both precise compliance and contextual understanding, providing objective and scalable performance assessment for your users.
Key insights
Localized speech models and hybrid AI are critical for scalable, objective ATCO training simulation.
Principles
- Fine-tuning ASR with local accents improves domain accuracy.
- Hybrid rule-LLM evaluation offers robust performance assessment.
- Parameter-efficient fine-tuning adapts TTS models effectively.
Method
The system processes speech via audio preprocessing, VAD, fine-tuned ASR, two-stage CIU (RegEx + DSPy/LLM), parallel response generation, and domain-adapted TTS.
In practice
- Adapt ASR models using local accent and aviation-specific corpora.
- Employ DSPy with LLMs for structured command parsing and response.
- Combine rule-based and LLM evaluation for communication assessment.
Topics
- ATCO Training Simulators
- Automatic Speech Recognition
- Text-to-Speech
- Large Language Models
- Aviation Radiotelephony
- Performance Evaluation
Best for: NLP Engineer, AI Scientist, Research Scientist, AI Engineer, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.