RAG for SKILLS: Retrieval Augmented Execution (SkillRAE)
Summary
A new study introduces SkillRAE (Retrieval Augmented Execution), a system designed to address the mathematical naivety of current AI engineering practices that assume LLMs can autonomously execute tasks by simply being fed tool repositories. SkillRAE shifts from retrieving passive factual knowledge (text) to active procedural operators (executable functions, code blocks, API calls). It constructs a multi-level bipartite skill graph offline, comprising skill communities, skill nodes, and sub-unit nodes, and uses sentence transformer embeddings and K-means clustering. During online execution, SkillRAE employs a dual-signal retrieval mechanism—top-down (macro) and bottom-up (micro)—to identify relevant skills and their sub-units, even those from unselected skills. A core algorithmic novelty is its context compilation, which rescues and grafts useful sub-units onto compatible selected skills, effectively providing the LLM with a compiled, executable blueprint rather than raw, isolated tools. Performance benchmarks on SkillBench and AgentSkill OS show SkillRAE significantly outperforms other methodologies, including vanilla retrieval and Skill Router, using models like Codec CLI with GPT 5.2 and Gemini CLI with Gemini 3 flash.
Key takeaway
For AI Architects and Research Scientists designing autonomous AI systems, SkillRAE demonstrates that simply providing LLMs with tool repositories is insufficient for robust execution. You should consider implementing a retrieval augmented execution (RAE) system that compiles executable skills and their sub-units into a coherent blueprint. This approach bypasses the LLM's limitations in on-the-fly dependency resolution, significantly improving performance and enabling more reliable agent behavior in complex, document-centric, or data-intensive workflows.
Key insights
SkillRAE compiles executable operators into a coherent blueprint, overcoming LLM limitations in dynamic skill execution.
Principles
- LLMs are "raw CPUs" requiring compiled, not raw, tools.
- Retrieval for operators demands compilation, not just concatenation.
- Dual-signal retrieval (macro/micro) enhances skill relevance.
Method
SkillRAE builds an offline multi-level skill graph, then uses online dual-signal retrieval (top-down and bottom-up) to select and compile executable sub-units into a coherent context for LLM execution, avoiding on-the-fly dependency resolution.
In practice
- Implement multi-level skill graphs for complex agent tasks.
- Utilize dual-signal retrieval for comprehensive skill selection.
- Incorporate context compilation for operator-based workflows.
Topics
- SkillRAE
- Retrieval Augmented Execution
- LLM Agent Skills
- Skill Graph
- Context Compilation
Best for: AI Architect, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Discover AI.