Poster: Exploring the Limits of Audio-Based Detection of Turkish Phone Call Scams
Summary
A new study investigates the limits of audio-based detection for Turkish phone call scams, addressing a critical gap in research predominantly focused on high-resource languages. This work introduces the first public multi-modal dataset for Turkish, comprising 100 aligned audio-transcript pairs of scam and benign conversations. Researchers evaluated seven large language models, including Gemini 2.5 (Flash, Flash-Lite, Pro), GPT-4o, and Qwen (Max, Plus, Turbo), across three input conditions: raw audio, automatic speech-to-text transcripts, and native speaker-refined transcripts. The findings indicate that transcript-based inputs consistently outperform direct audio processing for scam detection. Furthermore, human-corrected transcripts performed comparably to uncorrected automatic speech-to-text transcripts. This research highlights the urgent need for culturally and linguistically inclusive AI safety and robust multi-modal fraud prevention systems.
Key takeaway
For NLP Engineers developing fraud detection systems in low-resource languages like Turkish, these findings suggest prioritizing transcript-based inputs over direct audio processing. You should utilize automatic speech-to-text (ASR) outputs, as human-corrected transcripts offer comparable performance, reducing annotation overhead. Focus your efforts on refining ASR quality or LLM prompt engineering for transcript analysis, rather than extensive manual transcription correction, to build more effective and scalable multi-modal fraud prevention systems.
Key insights
Transcript-based LLM analysis significantly improves Turkish phone scam detection over raw audio, even with uncorrected ASR.
Principles
- Prioritize low-resource languages for AI safety research.
- Transcript-based inputs enhance audio scam detection.
- Human correction of ASR offers minimal benefit for this task.
Method
Evaluate LLMs (Gemini, GPT-4o, Qwen) on a 100-pair multi-modal Turkish dataset. Compare raw audio, ASR, and human-refined transcript inputs for scam detection.
In practice
- Prioritize ASR transcripts over raw audio for scam detection.
- Utilize existing ASR without extensive human correction.
- Develop multi-modal systems for fraud prevention.
Topics
- Large Language Models
- Scam Detection
- Turkish Language Processing
- Automatic Speech Recognition
- Multi-modal AI
- AI Safety
Best for: Research Scientist, AI Scientist, NLP Engineer, AI Security Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.