Poster: Exploring the Limits of Audio-Based Detection of Turkish Phone Call Scams

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, quick

Summary

A new study investigates the limits of audio-based detection for Turkish phone call scams, addressing a critical gap in research predominantly focused on high-resource languages. This work introduces the first public multi-modal dataset for Turkish, comprising 100 aligned audio-transcript pairs of scam and benign conversations. Researchers evaluated seven large language models, including Gemini 2.5 (Flash, Flash-Lite, Pro), GPT-4o, and Qwen (Max, Plus, Turbo), across three input conditions: raw audio, automatic speech-to-text transcripts, and native speaker-refined transcripts. The findings indicate that transcript-based inputs consistently outperform direct audio processing for scam detection. Furthermore, human-corrected transcripts performed comparably to uncorrected automatic speech-to-text transcripts. This research highlights the urgent need for culturally and linguistically inclusive AI safety and robust multi-modal fraud prevention systems.

Key takeaway

For NLP Engineers developing fraud detection systems in low-resource languages like Turkish, these findings suggest prioritizing transcript-based inputs over direct audio processing. You should utilize automatic speech-to-text (ASR) outputs, as human-corrected transcripts offer comparable performance, reducing annotation overhead. Focus your efforts on refining ASR quality or LLM prompt engineering for transcript analysis, rather than extensive manual transcription correction, to build more effective and scalable multi-modal fraud prevention systems.

Key insights

Transcript-based LLM analysis significantly improves Turkish phone scam detection over raw audio, even with uncorrected ASR.

Principles

Method

Evaluate LLMs (Gemini, GPT-4o, Qwen) on a 100-pair multi-modal Turkish dataset. Compare raw audio, ASR, and human-refined transcript inputs for scam detection.

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.