New RFP on Interpretability from Schmidt Sciences

2026-03-17 · Source: AI Alignment Forum · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Expert, medium

Summary

Schmidt Sciences has launched a "Request for Proposals" (RFP) for a pilot program in AI interpretability, seeking methods to detect and mitigate deceptive behaviors in "Large Language Models" (LLMs) by May 26, 2026. The program focuses on three key directions: detecting contradictions between an LLM's output and its internal representation, steering models to enhance truthfulness through mechanistic understanding, and applying these techniques to improve "human-AI collaboration" and "multi-agent systems". Successful proposals, eligible for \$300k-\$1M over 1-3 years, should leverage interpretability to outperform baselines not relying on weight access and address concrete risks beyond academic benchmarks. Schmidt Sciences also offers significant "compute resources", "software engineering support", and "API credits" to grantees, emphasizing ambitious, field-shaping contributions to "AI safety".

Key takeaway

Schmidt Sciences invites proposals for AI interpretability research focused on detecting and mitigating deceptive LLM behaviors like misleading advice or false claims. Projects, funded \$300k-\$1M over 1-3 years, must develop methods leveraging model internals to outperform black-box baselines in identifying and steering model reasoning. This initiative aims to advance universal deception detection and reliable truthfulness steering, crucial for building safer, more trustworthy AI systems and human-AI collaboration.

Topics

AI Interpretability
Deceptive AI Behaviors
LLM Steering
Mechanistic Interpretability
AI Safety

Best for: Research Scientist, AI Researcher, AI Scientist, AI Ethicist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI Alignment Forum.