Constraint Tax in Open-Weight LLMs: An Empirical Study of Tool Calling Suppression Under Structured Output Constraints

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, medium

Summary

An empirical study reveals a phenomenon called "Tool Suppression" in open-weight Large Language Models (LLMs) when Tool Calling and JSON Schema constraints are jointly applied. This behavior, observed across multiple model families in a production Agent system, causes LLMs to cease invoking tools despite maintaining high schema compliance. The research identifies that JSON Schema constraints, when compiled into grammar-based token masks, render tool-call tokens unreachable during decoding. This leads to the Constraint Priority Inversion (CPI) hypothesis, suggesting schema satisfaction dominates action-selection under multiple constraints. To address this, the study proposes Transparent Two-Pass Execution, an inference-time strategy that decouples tool execution from schema-constrained response generation, successfully restoring tool invocation and structured output guarantees without model retraining. The findings highlight reliability issues overlooked by separate evaluations of tool use and structured output.

Key takeaway

For AI Engineers developing agent systems that combine tool calling with structured JSON output, you must account for "Tool Suppression." Jointly applying JSON Schema constraints can inadvertently disable tool invocation by making tool-call tokens unreachable. Implement the Transparent Two-Pass Execution strategy to decouple these processes. This ensures reliable tool use and strict schema compliance without model retraining. Your evaluation processes should also test these capabilities concurrently, not in isolation, to prevent critical reliability issues in production.

Key insights

Joint JSON Schema and Tool Calling constraints suppress tool invocation in open-weight LLMs due to token mask conflicts.

Principles

Method

Transparent Two-Pass Execution decouples tool execution from schema-constrained response generation, restoring tool invocation while preserving structured output guarantees without retraining.

In practice

Topics

Code references

Best for: AI Architect, Research Scientist, CTO, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.