Constraint Tax in Open-Weight LLMs: An Empirical Study of Tool Calling Suppression Under Structured Output Constraints

2026-06-24 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, medium

Summary

An empirical study reveals a phenomenon called "Tool Suppression" in open-weight Large Language Models (LLMs) when Tool Calling and JSON Schema constraints are jointly applied. This behavior, observed across multiple model families in a production Agent system, causes LLMs to cease invoking tools despite maintaining high schema compliance. The research identifies that JSON Schema constraints, when compiled into grammar-based token masks, render tool-call tokens unreachable during decoding. This leads to the Constraint Priority Inversion (CPI) hypothesis, suggesting schema satisfaction dominates action-selection under multiple constraints. To address this, the study proposes Transparent Two-Pass Execution, an inference-time strategy that decouples tool execution from schema-constrained response generation, successfully restoring tool invocation and structured output guarantees without model retraining. The findings highlight reliability issues overlooked by separate evaluations of tool use and structured output.

Key takeaway

For AI Engineers developing agent systems that combine tool calling with structured JSON output, you must account for "Tool Suppression." Jointly applying JSON Schema constraints can inadvertently disable tool invocation by making tool-call tokens unreachable. Implement the Transparent Two-Pass Execution strategy to decouple these processes. This ensures reliable tool use and strict schema compliance without model retraining. Your evaluation processes should also test these capabilities concurrently, not in isolation, to prevent critical reliability issues in production.

Key insights

Joint JSON Schema and Tool Calling constraints suppress tool invocation in open-weight LLMs due to token mask conflicts.

Principles

JSON Schema constraints can make tool-call tokens unreachable.
Separate evaluation of tool use and structured output is insufficient.
Schema satisfaction may dominate action-selection under multiple constraints (CPI).

Method

Transparent Two-Pass Execution decouples tool execution from schema-constrained response generation, restoring tool invocation while preserving structured output guarantees without retraining.

In practice

Implement Transparent Two-Pass Execution for joint constraints.
Evaluate tool use and structured output jointly for reliability.
Be aware of token mask conflicts with grammar-based constraints.

Topics

Tool Calling
Structured Output
JSON Schema
Open-Weight LLMs
Agent Systems
Constrained Decoding

Code references

Best for: AI Architect, Research Scientist, CTO, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.