Authorship Attribution in Multilingual Machine-Generated Texts

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

A new study introduces the challenge of Multilingual Authorship Attribution (AA) for texts generated by Large Language Models (LLMs), moving beyond traditional binary classification to identify specific LLM generators or human authors across diverse languages. Current AA efforts are largely confined to monolingual settings, primarily English. Researchers investigated the cross-lingual transferability of existing monolingual AA methods across 18 languages, encompassing various language families and writing scripts, and 8 generators (7 LLMs and the human-authored class). The findings, accepted at ACL 2026, indicate that while some monolingual AA techniques can be adapted for multilingual use, substantial limitations persist, particularly in transferring attribution capabilities across different language families. This underscores the inherent complexity of multilingual AA and the urgent need for more robust methodologies to address real-world scenarios effectively.

Key takeaway

For NLP Engineers developing MGT detection systems, recognize that current authorship attribution methods struggle significantly in multilingual contexts. If your applications involve diverse languages, you must move beyond monolingual approaches. Prioritize developing or integrating robust, truly multilingual AA models. Adapting existing methods offers limited success and may lead to inaccurate generator identification across varied language families.

Key insights

Multilingual authorship attribution for LLM-generated text is complex, with current monolingual methods showing limited cross-lingual transferability across diverse language families.

Principles

Method

The study investigated monolingual AA method suitability for multilingual settings by testing cross-lingual transferability and generator impact across 18 languages and 8 generators (7 LLMs + human).

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.