The Agentic Gap: What Enterprises Think vs. What Actually Works With Jeff Dalton
Summary
Jeff Dalton, Head of AI and Chief Scientist at Valence, discussed agentic system design, evaluation, and memory management in AI coaching systems. He highlighted the enduring relevance of classical AI theory, emphasizing that agent fundamentals like planning, action, observation, and state maintenance remain unchanged, though their implementation has evolved with large language models. Dalton detailed Valence's AI coach, Nadia, which aims to improve user performance over time by optimizing for long-term value, introducing "productive friction," and personalizing interactions based on organizational values and user profiles. He stressed the importance of an "eval-first" approach, using structured prompts as code, human-calibrated rubrics for subjective evaluations, and a multi-layered "defense-in-depth" strategy for guardrails to ensure safety and address domain-specific challenges in enterprise deployments.
Key takeaway
For AI Engineers building complex agentic systems, you should integrate an "eval-first" mindset and structured prompt design to ensure traceability and debuggability. Focus on defining clear objective functions and using human-calibrated rubrics for subjective evaluations, especially in coaching or personalized AI, to accurately measure long-term impact and user growth. Consider a defense-in-depth approach for guardrails to manage diverse enterprise deployment risks and maintain user trust.
Key insights
Classical AI agent principles remain vital, with modern LLMs enabling new implementation methods and complex agentic systems.
Principles
- Agent fundamentals (plan, act, observe, state) are timeless.
- Evaluation must precede prompt engineering.
- Memory is a first-class object in coaching systems.
Method
Design agentic systems with structured prompts as code, enabling traceability and inspection. Employ human-calibrated rubrics for subjective evaluation, focusing on process quality, outcome quality, and user progress over time.
In practice
- Start with simple prompts and small data sets for initial evaluation.
- Implement multi-layered guardrails for robust safety.
- Prioritize user privacy and control over personal data.
Topics
- Agentic Systems
- AI Coaching
- Evaluation Methodologies
- Guardrail Systems
- Memory Management
Best for: AI Scientist, AI Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI Explained.