Definity embeds agents inside Spark pipelines to catch failures before they reach agentic AI systems
Summary
Definity, a Chicago-based data pipeline operations startup, has raised $12 million in Series A financing led by GreatPoint Ventures, with participation from Dynatrace, StageOne Ventures, and Hyde Park Venture Partners. The company addresses the challenge of pipeline reliability for data engineering teams by embedding agentic AI directly within Spark or DBT drivers. Unlike existing monitoring tools that operate after job completion, Definity's agents act during a pipeline run, capturing real-time execution context such as query behavior, memory pressure, and data skew. This in-execution approach allows for real-time detection and prevention of issues, including modifying resource allocation mid-run or stopping jobs before bad data propagates. An enterprise customer, Nexxen, reported identifying 33% of optimization opportunities within the first week and reducing troubleshooting effort by 70% after deploying Definity.
Key takeaway
For CTOs and VP of Engineering overseeing large-scale data operations, consider evaluating in-execution agentic AI solutions like Definity to shift from reactive monitoring to proactive pipeline optimization. This approach can significantly reduce troubleshooting effort and infrastructure costs, as demonstrated by Nexxen's 70% reduction in optimization effort, directly impacting your team's ability to support critical AI workloads and accelerate product roadmaps.
Key insights
In-execution agentic AI prevents data pipeline failures and optimizes performance in real time.
Principles
- Real-time context is critical for pipeline control.
- Intervention during execution prevents downstream issues.
- Proactive optimization reduces operational costs.
Method
Definity installs a JVM agent within the pipeline execution layer to capture real-time data, infer lineage, and enable mid-run modifications or preemption based on detected conditions.
In practice
- Embed agents directly in Spark/DBT drivers.
- Capture execution context during pipeline runs.
- Automate resource allocation and job preemption.
Topics
- Definity Platform
- Spark Pipelines
- In-Execution Agents
- Data Pipeline Reliability
- Agentic AI Systems
Best for: CTO, VP of Engineering/Data, AI Architect, Data Engineer, MLOps Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by VentureBeat.