System Report for CCL25-Eval Task 5: New Dataset and LoRA-Fine-Tuned Qwen2.5
Summary
A new system report details the development of PoetryQwen, a domain-specialized large language model designed for classical Chinese poetry translation and emotional understanding. Addressing limitations in existing research that often neglects poetic appreciation's distinctive features and lacks high-quality domain-specific datasets, the task was decomposed into term interpretation, semantic interpretation, and emotional inference. Researchers constructed the Classical Chinese Poetry Instruction Pair Dataset (CCPoetry-49K), comprising 49,404 high-quality instruction-response pairs. PoetryQwen was created by applying Low-Rank Adaptation (LoRA) to fine-tune the Qwen2.5-14B model. Experimental results on the CCL25-Eval Task 5 benchmark show PoetryQwen achieved a score of 0.757, marking a 9.7% improvement over the Qwen2.5-14B-Instruct baseline's 0.690. This demonstrates a significant enhancement in precise translation and emotional understanding of classical poetry.
Key takeaway
For NLP Engineers developing specialized LLMs for culturally rich or niche domains, this work demonstrates a clear path to significant performance gains. You should consider decomposing complex tasks into subtasks and investing in constructing high-quality, domain-specific instruction datasets like CCPoetry-49K. Applying Low-Rank Adaptation (LoRA) to fine-tune a robust base model such as Qwen2.5-14B can yield substantial improvements, as shown by PoetryQwen's 9.7% benchmark increase.
Key insights
Domain-specific LoRA fine-tuning on Qwen2.5-14B with CCPoetry-49K significantly enhances classical Chinese poetry understanding and translation.
Principles
- Task decomposition aids domain-specific LLM optimization.
- High-quality, domain-specific datasets are critical.
- LoRA fine-tuning effectively adapts LLMs.
Method
The task was decomposed into term, semantic, and emotional interpretation. CCPoetry-49K, with 49,404 instruction-response pairs, was constructed. LoRA fine-tuning was then applied to the Qwen2.5-14B model to create PoetryQwen.
In practice
- Utilize CCPoetry-49K for classical poetry tasks.
- Apply LoRA to Qwen2.5-14B for domain adaptation.
- Decompose complex tasks for LLM specialization.
Topics
- Large Language Models
- Classical Chinese Poetry
- LoRA Fine-tuning
- Qwen2.5
- Domain Adaptation
- Instruction Datasets
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.