Poster: Exploring the Limits of Audio-Based Detection of Turkish Phone Call Scams

2026-06-23 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, quick

Summary

A new study investigates the limits of audio-based detection for Turkish phone call scams, addressing a critical gap in research predominantly focused on high-resource languages. This work introduces the first public multi-modal dataset for Turkish, comprising 100 aligned audio-transcript pairs of scam and benign conversations. Researchers evaluated seven large language models, including Gemini 2.5 (Flash, Flash-Lite, Pro), GPT-4o, and Qwen (Max, Plus, Turbo), across three input conditions: raw audio, automatic speech-to-text transcripts, and native speaker-refined transcripts. The findings indicate that transcript-based inputs consistently outperform direct audio processing for scam detection. Furthermore, human-corrected transcripts performed comparably to uncorrected automatic speech-to-text transcripts. This research highlights the urgent need for culturally and linguistically inclusive AI safety and robust multi-modal fraud prevention systems.

Key takeaway

For NLP Engineers developing fraud detection systems in low-resource languages like Turkish, these findings suggest prioritizing transcript-based inputs over direct audio processing. You should utilize automatic speech-to-text (ASR) outputs, as human-corrected transcripts offer comparable performance, reducing annotation overhead. Focus your efforts on refining ASR quality or LLM prompt engineering for transcript analysis, rather than extensive manual transcription correction, to build more effective and scalable multi-modal fraud prevention systems.

Key insights

Transcript-based LLM analysis significantly improves Turkish phone scam detection over raw audio, even with uncorrected ASR.

Principles

Prioritize low-resource languages for AI safety research.
Transcript-based inputs enhance audio scam detection.
Human correction of ASR offers minimal benefit for this task.

Method

Evaluate LLMs (Gemini, GPT-4o, Qwen) on a 100-pair multi-modal Turkish dataset. Compare raw audio, ASR, and human-refined transcript inputs for scam detection.

In practice

Prioritize ASR transcripts over raw audio for scam detection.
Utilize existing ASR without extensive human correction.
Develop multi-modal systems for fraud prevention.

Topics

Large Language Models
Scam Detection
Turkish Language Processing
Automatic Speech Recognition
Multi-modal AI
AI Safety

Best for: Research Scientist, AI Scientist, NLP Engineer, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.