New RFP on Interpretability from Schmidt Sciences
Summary
Schmidt Sciences has launched a "Request for Proposals" (RFP) for a pilot program in AI interpretability, seeking methods to detect and mitigate deceptive behaviors in "Large Language Models" (LLMs) by May 26, 2026. The program focuses on three key directions: detecting contradictions between an LLM's output and its internal representation, steering models to enhance truthfulness through mechanistic understanding, and applying these techniques to improve "human-AI collaboration" and "multi-agent systems". Successful proposals, eligible for \$300k-\$1M over 1-3 years, should leverage interpretability to outperform baselines not relying on weight access and address concrete risks beyond academic benchmarks. Schmidt Sciences also offers significant "compute resources", "software engineering support", and "API credits" to grantees, emphasizing ambitious, field-shaping contributions to "AI safety".
Key takeaway
Schmidt Sciences invites proposals for AI interpretability research focused on detecting and mitigating deceptive LLM behaviors like misleading advice or false claims. Projects, funded \$300k-\$1M over 1-3 years, must develop methods leveraging model internals to outperform black-box baselines in identifying and steering model reasoning. This initiative aims to advance universal deception detection and reliable truthfulness steering, crucial for building safer, more trustworthy AI systems and human-AI collaboration.
Topics
- AI Interpretability
- Deceptive AI Behaviors
- LLM Steering
- Mechanistic Interpretability
- AI Safety
Best for: Research Scientist, AI Researcher, AI Scientist, AI Ethicist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI Alignment Forum.