Constraint Tax in Open-Weight LLMs: An Empirical Study of Tool Calling Suppression Under Structured Output Constraints
Summary
An empirical study reveals a phenomenon called "Tool Suppression" in open-weight Large Language Models (LLMs) when Tool Calling and JSON Schema constraints are jointly applied. This behavior, observed across multiple model families in a production Agent system, causes LLMs to cease invoking tools despite maintaining high schema compliance. The research identifies that JSON Schema constraints, when compiled into grammar-based token masks, render tool-call tokens unreachable during decoding. This leads to the Constraint Priority Inversion (CPI) hypothesis, suggesting schema satisfaction dominates action-selection under multiple constraints. To address this, the study proposes Transparent Two-Pass Execution, an inference-time strategy that decouples tool execution from schema-constrained response generation, successfully restoring tool invocation and structured output guarantees without model retraining. The findings highlight reliability issues overlooked by separate evaluations of tool use and structured output.
Key takeaway
For AI Engineers developing agent systems that combine tool calling with structured JSON output, you must account for "Tool Suppression." Jointly applying JSON Schema constraints can inadvertently disable tool invocation by making tool-call tokens unreachable. Implement the Transparent Two-Pass Execution strategy to decouple these processes. This ensures reliable tool use and strict schema compliance without model retraining. Your evaluation processes should also test these capabilities concurrently, not in isolation, to prevent critical reliability issues in production.
Key insights
Joint JSON Schema and Tool Calling constraints suppress tool invocation in open-weight LLMs due to token mask conflicts.
Principles
- JSON Schema constraints can make tool-call tokens unreachable.
- Separate evaluation of tool use and structured output is insufficient.
- Schema satisfaction may dominate action-selection under multiple constraints (CPI).
Method
Transparent Two-Pass Execution decouples tool execution from schema-constrained response generation, restoring tool invocation while preserving structured output guarantees without retraining.
In practice
- Implement Transparent Two-Pass Execution for joint constraints.
- Evaluate tool use and structured output jointly for reliability.
- Be aware of token mask conflicts with grammar-based constraints.
Topics
- Tool Calling
- Structured Output
- JSON Schema
- Open-Weight LLMs
- Agent Systems
- Constrained Decoding
Code references
Best for: AI Architect, Research Scientist, CTO, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.