[P] QLoRA Fine-Tuning of Qwen2.5-1.5B for CEFR English Proficiency Classification (A1–C2) [P]

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Intermediate, short

Summary

A QLoRA fine-tuning project classified English text into 6 CEFR proficiency levels (A1-C2) using the Qwen2.5-1.5B model. The model was fine-tuned with 4-bit NF4 quantization, training only ~0.28% of its parameters. The dataset comprised 1,785 synthetically generated English texts, balanced across CEFR levels and 10 domains, created using the Groq API and Llama-3.3-70B with constraints to preserve linguistic patterns. On a held-out test set of 179 samples, the model achieved 84.9% accuracy and Macro F1 score. Per-level recall ranged from 96.6% for A1 to 60.0% for C2, with most errors occurring between C1 and C2 levels. A FastAPI inference API and Docker deployment setup were also developed.

Key takeaway

For NLP engineers developing language learning applications, this project demonstrates a viable approach to CEFR classification. You should consider QLoRA with smaller models like Qwen2.5-1.5B for efficient deployment. Be aware that synthetic data may introduce distribution shifts, especially for nuanced levels like C2, and plan for validation with authentic learner data to improve real-world performance.

Key insights

QLoRA fine-tuning of Qwen2.5-1.5B effectively classifies CEFR English proficiency, despite synthetic data limitations.

Principles

Method

Fine-tune Qwen2.5-1.5B with QLoRA (4-bit NF4) and a linear classification head on a synthetically generated, balanced text dataset for multi-class CEFR proficiency classification.

In practice

Topics

Best for: Machine Learning Engineer, NLP Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.