[D] Telecom modernization on legacy OSS, what actually worked for ML data extraction
Summary
A telecom modernization project successfully integrated machine learning into a legacy Operational Support System (OSS) stack from the early 2000s, characterized by a C++ core, Perl glue, and a lack of APIs or event hooks. The primary challenge was data extraction from this live, mission-critical system, rather than the ML model development itself. Unsuccessful approaches included application-layer log parsing due to format drift, direct instrumentation of legacy C++ binaries, and ETL polling the database, which caused performance issues. Effective data extraction methods involved Change Data Capture (CDC) via Debezium on the MySQL binlog, eBPF uprobes on C++ function calls, and DBI hooks on the Perl side. A significant effort was also required for data normalization due to fifteen years of format drift, repurposed columns, and timezone inconsistencies.
Key takeaway
For AI Architects or ML Engineers tasked with integrating machine learning into deeply entrenched legacy systems, prioritize non-invasive data extraction techniques. Your project's success will hinge on robust data capture methods like CDC, eBPF, or DBI hooks, and you should allocate substantial effort to data normalization, as format drift and undocumented changes will be major hurdles.
Key insights
Extracting data from legacy systems for ML requires non-invasive, robust methods to overcome inherent architectural limitations.
Principles
- Avoid application-layer changes
- Prioritize non-invasive data capture
- Anticipate significant data normalization
Method
Utilize CDC (Debezium on binlog), eBPF uprobes for non-DB C++ calls, and DBI hooks for Perl to extract data from legacy systems without direct application modification.
In practice
- Implement Debezium for MySQL binlog CDC
- Explore eBPF for C++ function call tracing
- Use DBI hooks for Perl data interception
Topics
- Legacy System Modernization
- ML Data Extraction
- Debezium CDC
- eBPF
- Perl DBI
Best for: AI Architect, AI Engineer, Machine Learning Engineer, MLOps Engineer, Data Engineer, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.