Syn-TurnTurk: A Synthetic Dataset for Turn-Taking Prediction in Turkish Dialogues

2026-04-15 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Advanced, quick

Summary

Researchers have developed Syn-TurnTurk, a synthetic Turkish dialogue dataset designed to improve turn-taking prediction in voice-based chatbots. Current systems often struggle with natural conversational flow due to reliance on simple silence detection, leading to interruptions, a problem exacerbated in languages like Turkish that lack suitable datasets. Syn-TurnTurk was generated using Qwen Large Language Models (LLMs) to simulate realistic human speech patterns, including overlaps and strategic silences. Evaluation with traditional and deep learning architectures, specifically BI-LSTM and Ensemble (LR+RF) methods, demonstrated high accuracy of 0.839 and AUC scores of 0.910. These results indicate that the synthetic dataset effectively helps models interpret linguistic cues for more natural human-machine interaction in Turkish.

Key takeaway

For research scientists developing conversational AI in low-resource languages, this work demonstrates that synthetic datasets, like Syn-TurnTurk, can significantly enhance turn-taking prediction. You should explore using LLMs to generate linguistically rich synthetic data to overcome the scarcity of real-world dialogue corpora, thereby improving the naturalness of human-machine interactions in your target language.

Key insights

Synthetic datasets can effectively train models for complex linguistic tasks like turn-taking prediction in under-resourced languages.

Principles

Silence detection alone is insufficient for natural turn-taking.
Synthetic data can bridge gaps for low-resource languages.

Method

The Syn-TurnTurk dataset was generated using Qwen Large Language Models to simulate Turkish dialogues, incorporating overlaps and strategic silences to mirror real-life verbal exchanges.

In practice

Use Qwen LLMs for synthetic dialogue generation.
Employ BI-LSTM or Ensemble (LR+RF) for turn-taking models.

Topics

Syn-TurnTurk
Turn-Taking Prediction
Turkish Dialogues
Synthetic Datasets
Large Language Models

Best for: Research Scientist, NLP Engineer, AI Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.