Data Flow Control: Data Safety Policies for AI Agents
Summary
The Data Flow Control (DFC) framework addresses the critical issue of data safety for AI agents that generate SQL, orchestrate pipelines, and automate data analysis. While queries may be semantically correct, they often violate regulatory, privacy, or business constraints regarding data combination and release. DFC introduces a declarative method to specify and guarantee policy enforcement over tuple-level data flows directly within a DBMS query. It formalizes data safety as aggregate predicates over provenance monomials. The paper presents Passant, a portable query rewriting layer that enforces DFC policies across five DBMS engines—DuckDB, Umbra, PostgreSQL, DataFusion, and SQLServer—without materializing provenance. Passant achieves approximately 0% overhead and significantly outperforms alternative methods, marking a crucial step towards integrating data safety into core data infrastructure rather than relying on prompts or post-hoc checks. The DFC framework is available open source.
Key takeaway
For data engineers and AI scientists building agentic systems that generate SQL or automate data analysis, you should consider integrating Data Flow Control (DFC) to embed data safety directly into your DBMS. This framework moves policy enforcement from prompts to infrastructure, guaranteeing compliance with regulatory, privacy, and business constraints. By adopting DFC, you can ensure data integrity and prevent violations with minimal performance impact, as demonstrated by Passant's ~0% overhead across major DBMS engines.
Key insights
Data Flow Control (DFC) embeds data safety policies directly into DBMS queries for AI agents, ensuring compliance without performance overhead.
Principles
- Data safety is fundamentally a data infrastructure problem.
- Policy enforcement must be optimizer-invariant yet efficient.
- Aggregate predicates over provenance monomials formalize data safety.
Method
DFC formalizes data safety via aggregate predicates over provenance monomials. Passant then rewrites queries to enforce these policies without materializing provenance, integrating safety into the DBMS.
In practice
- Enforce regulatory compliance in SQL queries.
- Prevent privacy violations by AI agents.
- Integrate data safety directly into DBMS.
Topics
- Data Flow Control
- AI Agents
- Data Safety Policies
- DBMS Query Rewriting
- Provenance Monomials
- Regulatory Compliance
Code references
Best for: CTO, AI Architect, VP of Engineering/Data, AI Scientist, AI Engineer, Data Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.