Tensor Algebraic Property Skeletons: Amplifying Property-Based Testing for AI Compilers

2026-06-19 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

Propilot is an LLM-driven agentic property-based testing framework designed to amplify testing for deep learning (DL) compilers like TVM. It addresses the challenge of semantic drifts in compiler optimizations by transforming tensor algebra knowledge into executable property-based tests (PBTs). Propilot, utilizing GPT 5.5, represents tensor algebra as reusable "property skeletons" that include operator constraints, shape/value rules, and oracle templates. It then instantiates these skeletons into concrete PBTs, generating paired computation graphs, tensor inputs, and expected semantic relations. A crucial validation step prevents invalid or uninformative PBTs, with feedback guiding subsequent generation. Evaluated on TVM with 212 operators and 20 property skeletons, Propilot generated 4,579 PBTs, reducing redundancy by 49% and eliminating invalid tests compared to direct LLM-based generation. This effectiveness led to finding semantic errors (50%) and numerical discrepancies (25%), highlighting its ability to uncover meaningful compiler behaviors beyond mere input validation.

Key takeaway

For Machine Learning Engineers or AI Scientists developing or deploying deep learning compilers, you should integrate structured property-based testing to catch subtle semantic errors. Relying solely on fuzzing or unconstrained LLM test generation often yields invalid or uninformative tests. Instead, adopt a framework like Propilot's, using explicit tensor algebra property skeletons and rigorous validation, to ensure your compiler transformations preserve critical algebraic invariants and numerical precision, thereby improving model reliability.

Key insights

Tensor algebra property skeletons, combined with LLM-driven instantiation and validation, enable scalable, semantically rich DL compiler testing.

Principles

Semantic invariants are crucial for compiler correctness.
Explicit property skeletons guide LLM test generation.
Validation feedback improves test quality and relevance.

Method

Propilot stores tensor algebra as reusable property skeletons, instantiates them into PBTs via an LLM agent, and validates tests before execution, using feedback for prioritization.

In practice

Encode algebraic properties as reusable test templates.
Implement pre-execution validation for generated tests.
Prioritize test generation based on coverage and failures.

Topics

Deep Learning Compilers
Property-Based Testing
Tensor Algebra
LLM-driven Agents
TVM
Semantic Verification

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.