Principles and Practices of Large-Scale Code Analysis at Ant Group: A Data- and Logic-Oriented Approach

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Software Development & Engineering, Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, extended

Summary

Ant Group's CodeFuse-Query is a static code analysis system designed for large-scale software development, capable of scanning over 10 billion lines of code daily and supporting more than 300 distinct tasks across 9 programming languages. It integrates Domain Optimized System Design, which includes resource optimization, data reusability, and incremental code extraction, with Logic Oriented Computation Design. The latter leverages Datalog and a two-tiered COREF schema to transform source code into data facts, enabling complex analysis tasks through the Gödel language. CodeFuse-Query demonstrates significant robustness, scalability, and efficiency, addressing challenges in large organizations with over ten thousand developers. The project is open-sourced, fostering further innovation in the field.

Key takeaway

For engineering leads managing vast, multi-language codebases, CodeFuse-Query offers a robust solution to overcome traditional static analysis limitations. You should consider adopting its data-centric, Datalog-based approach for scalable, efficient analysis, especially for tasks like change impact assessment or LLM training data preparation. Its incremental extraction and custom query language, Gödel, can significantly enhance productivity and maintainability. Explore its open-source implementation to tailor complex analysis needs.

Key insights

CodeFuse-Query redefines static code analysis as a data computation task for large-scale efficiency.

Principles

Method

Formulate tasks in Gödel, generate optimized Datalog execution plans, then compile and execute against extracted code facts for analysis results.

In practice

Topics

Code references

Best for: CTO, VP of Engineering/Data, AI Scientist, Software Engineer, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.