Evaluative Judgement in Teaching AI-based Translation: A Class-room Case Study of AI-Mediated Translation and Post-Editing

· Source: Computation and Language · Field: Education & Learning — Educational Technology (EdTech), Academic Research & Higher Education, Artificial Intelligence & Machine Learning · Depth: Intermediate, quick

Summary

A classroom case study involving 23 fourth-year BA-level translation students examined how structured comparison of general-purpose LLMs and online machine translation (MT) systems fosters evaluative judgment in AI-mediated translation. Students in a Machine Translation and Post-editing course translated short specialized English Wikipedia texts into Catalan or Spanish. They generated four system outputs, evaluated them using both automatic metrics and human adequacy/fluency assessments, then selected one output for post-editing, justifying their decision in written reports. The study found that students did not treat automatic metrics as the final authority. Their final post-editing selections frequently diverged from metric rankings, instead being justified by factors such as adequacy, fluency, terminology, naturalness, and anticipated post-editing effort. This research analyzes student justification processes within an authentic classroom assignment rather than benchmarking system performance.

Key takeaway

For translation educators designing AI-mediated translation curricula, you should emphasize human evaluative judgment over sole reliance on automatic metrics. Guide students to justify post-editing choices based on factors like adequacy, fluency, terminology, and anticipated effort, rather than just metric rankings. This approach cultivates critical thinking and prepares students for real-world translation scenarios where nuanced human assessment is paramount.

Key insights

Students prioritize human-centric factors like adequacy and post-editing effort over automatic metrics when evaluating AI-mediated translations.

Principles

Method

Students translated texts, generated four MT outputs, evaluated them with automatic metrics and human assessment, selected one for post-editing, and justified their choice in reports.

In practice

Topics

Best for: NLP Engineer, AI Scientist, Research Scientist, Domain Expert

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.