Evaluative Judgement in Teaching AI-based Translation: A Class-room Case Study of AI-Mediated Translation and Post-Editing
Summary
A classroom case study involving 23 fourth-year BA-level translation students examined how structured comparison of general-purpose LLMs and online machine translation (MT) systems fosters evaluative judgment in AI-mediated translation. Students in a Machine Translation and Post-editing course translated short specialized English Wikipedia texts into Catalan or Spanish. They generated four system outputs, evaluated them using both automatic metrics and human adequacy/fluency assessments, then selected one output for post-editing, justifying their decision in written reports. The study found that students did not treat automatic metrics as the final authority. Their final post-editing selections frequently diverged from metric rankings, instead being justified by factors such as adequacy, fluency, terminology, naturalness, and anticipated post-editing effort. This research analyzes student justification processes within an authentic classroom assignment rather than benchmarking system performance.
Key takeaway
For translation educators designing AI-mediated translation curricula, you should emphasize human evaluative judgment over sole reliance on automatic metrics. Guide students to justify post-editing choices based on factors like adequacy, fluency, terminology, and anticipated effort, rather than just metric rankings. This approach cultivates critical thinking and prepares students for real-world translation scenarios where nuanced human assessment is paramount.
Key insights
Students prioritize human-centric factors like adequacy and post-editing effort over automatic metrics when evaluating AI-mediated translations.
Principles
- Automatic metrics are not definitive.
- Human judgment guides MT selection.
- Justification requires multi-factor analysis.
Method
Students translated texts, generated four MT outputs, evaluated them with automatic metrics and human assessment, selected one for post-editing, and justified their choice in reports.
In practice
- Integrate human assessment in MT workflows.
- Train evaluators on diverse criteria.
- Encourage justification beyond metrics.
Topics
- Machine Translation
- Post-editing
- Evaluative Judgment
- LLM Evaluation
- Translation Pedagogy
- Human-in-the-Loop AI
Best for: NLP Engineer, AI Scientist, Research Scientist, Domain Expert
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.