TiCo: Time-Controllable Training for Spoken Dialogue Models

2026-03-23 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

TiCo is a post-training method designed to enable spoken dialogue models (SDMs) to generate responses with controllable durations, adhering to time-constrained instructions. Existing SDMs, including both open-source and commercial systems, often fail to meet specific duration requirements, such as "generate a response lasting about 15 seconds," due to a lack of inherent time awareness. TiCo addresses this by introducing Spoken Time Markers (STM), like <10.6 seconds>, which allow the model to estimate elapsed speaking time during generation and adjust content to match a target duration. The method is efficient, requiring only a small amount of self-generated data and no additional question-answer pairs, leveraging reinforcement learning. Empirical evaluations demonstrate that TiCo significantly enhances adherence to duration constraints while maintaining high response quality.

Key takeaway

For NLP engineers developing voice assistants or interactive agents, TiCo offers a practical solution to a critical user experience challenge. Your systems can achieve more natural and effective interactions by precisely controlling response durations. Consider implementing TiCo's Spoken Time Markers and reinforcement learning approach to enhance adherence to time constraints, thereby improving overall interaction quality and user satisfaction.

Key insights

TiCo enables spoken dialogue models to control response duration using Spoken Time Markers and reinforcement learning.

Principles

Time awareness improves spoken interaction quality.
Post-training can add new model capabilities efficiently.

Method

TiCo uses Spoken Time Markers (STM) to track elapsed speaking time during generation. It employs self-generation and reinforcement learning with a small dataset to train models to meet target durations.

In practice

Integrate STMs into model input for time control.
Apply reinforcement learning for duration adherence.

Topics

Spoken Dialogue Models
Time-Controllable Training
Spoken Time Markers
Reinforcement Learning
Speech Generation

Best for: NLP Engineer, AI Scientist, Research Scientist, AI Engineer, Machine Learning Engineer, AI Researcher

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.