Self-Study Reconsidered: The Hidden Fragility of Learning from Self-Generated QA
Summary
Language models trained on self-generated question-answer (QA) supervision exhibit hidden fragility stemming from the generation process itself. This implicit policy, which selects evidence and dictates answering, is vulnerable at two stages. During question generation, models do not uniformly scan documents, instead concentrating on salient spans and allowing artifacts like poorly cleaned markup to hijack the process across model families and scales. When answering, the supervision-producing model tends to comply with instruction-like passages embedded in the text, a problem exacerbated by task conflict and more prevalent in larger models. These issues, arising from QA generation choices, can be mitigated by tying questions to fixed targets to reduce biased selection and filtering instruction-like spans, which reduced mean injection compliance from 88% to 13% in evaluations.
Key takeaway
For AI Scientists and ML Engineers developing models with synthetic QA, you must recognize that the generation process itself introduces significant fragility. Your training data can be compromised by biased question selection focusing on salient text and unintended compliance with embedded instructions, especially with larger models. To improve robustness, implement strategies like tying questions to fixed targets and pre-filtering instruction-like passages from source texts before QA generation. This directly addresses the root causes of fragility, enhancing model reliability.
Key insights
Self-generated QA supervision introduces fragility in language models through biased question selection and instruction compliance.
Principles
- QA generation is an implicit policy.
- Coverage saturates early on salient spans.
- Instruction compliance depends on intent.
Method
Reduce biased selection by tying each question to a fixed target. Lower injection compliance by filtering instruction-like spans before answering, achieving 13% compliance from 88%.
In practice
- Implement fixed-target question generation.
- Pre-filter instruction-like text.
- Scrutinize synthetic QA datasets.
Topics
- Language Models
- Synthetic Data Generation
- Question Answering
- Model Fragility
- Fine-tuning
- Data Preprocessing
Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.