ASTRA: A Scalable Next-Generation ATCO Training Simulator with Autonomous Simpilots

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Aviation & Aerospace · Depth: Expert, extended

Summary

ASTRA is an end-to-end training simulator designed to automate simpilot roles for Air Traffic Control Operator (ATCO) training, addressing capacity limitations and localization challenges in Singaporean operational contexts. It features a speech-to-speech pipeline that transcribes ATCO trainee speech, interprets instructions, and generates appropriate pilot and ATCO responses using locally adapted voice models. This fine-tuned Automatic Speech Recognition (ASR) pipeline significantly reduces Word Error Rate (WER) to 23.45% on Singaporean-accented aviation speech, a substantial improvement over existing off-the-shelf systems' 107.80%. ASTRA also integrates an AI-assisted performance evaluation framework, achieving post-optimization scores of 91.7% for accuracy, 88.2% for brevity, and 86.9% for completeness in assessing trainee radiotelephony communications. Built on open-source tools like DSPy and Unsloth, the system enables scalable, standardized ATCO assessment while reducing instructor workload.

Key takeaway

For MLOps Engineers deploying speech-based AI in specialized, accented operational environments, you should prioritize extensive fine-tuning of ASR and TTS models with local accent and domain-specific data. Relying on frontier models alone will lead to unacceptable error rates. Implement a hybrid evaluation framework combining deterministic rules with LLM-based analysis to ensure both precise compliance and contextual understanding, providing objective and scalable performance assessment for your users.

Key insights

Localized speech models and hybrid AI are critical for scalable, objective ATCO training simulation.

Principles

Method

The system processes speech via audio preprocessing, VAD, fine-tuned ASR, two-stage CIU (RegEx + DSPy/LLM), parallel response generation, and domain-adapted TTS.

In practice

Topics

Best for: NLP Engineer, AI Scientist, Research Scientist, AI Engineer, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.