Casual as an Anchor: Resolving Supervision Misalignment in Formality Transfer Dataset

2026-05-28 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

The paper "Casual as an Anchor: Resolving Supervision Misalignment in Formality Transfer Dataset" identifies a critical supervision design flaw in existing formality transfer benchmarks, such as GYAFC. These benchmarks frame formality transfer as a symmetric bidirectional task, but their binary human rewrites capture relative stylistic shifts rather than absolute human perceptions of formality. This flaw causes models to produce "pseudo-formal" language that satisfies benchmark labels but lacks genuine formality. The authors quantify this misalignment and propose a new framework that reconceptualizes formality as a three-level graded dimension: informal, casual, and formal, with "casual" serving as an explicit intermediate state. Based on this, they introduce 3LF, a new dataset providing parallel supervision across these three levels. Training on 3LF substantially reduces informal-to-formal failures and improves alignment with human perception; for instance, GPT-4.1-nano's F1 score improved from 0.06 to 0.88 in the informal-to-formal direction, despite 3LF being smaller than GYAFC.

Key takeaway

For NLP Engineers developing controllable text generation systems, especially for formality transfer, you should critically assess your benchmark's supervision design. Relying on binary formal/informal labels can lead to models generating pseudo-formal outputs. Consider adopting a graded formality spectrum, incorporating an intermediate "casual" state, and building datasets like 3LF. This approach significantly improves human alignment and reduces informal-to-formal generation failures, as demonstrated by GPT-4.1-nano's F1 score improvement from 0.06 to 0.88.

Key insights

Existing formality transfer benchmarks misalign supervision, leading models to generate pseudo-formal text; a graded "casual" anchor resolves this.

Principles

Formality transfer is a graded dimension, not binary.
Supervision design shapes stylistic alignment.
Intermediate states clarify supervision signals.

Method

Reconceptualize formality as informal, casual, formal. Introduce 3LF dataset with parallel supervision across these three levels to train models for improved human alignment.

In practice

Re-evaluate existing benchmark labels for misalignment.
Design datasets with graded stylistic dimensions.
Use "casual" as an intermediate anchor for clarity.

Topics

Formality Transfer
Text Generation
Dataset Design
Supervision Misalignment
Natural Language Processing
GPT-4.1-nano

Best for: Research Scientist, AI Scientist, NLP Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.