PEFT of SLM for Telecommunications Customer Support: A Comparative Study of LoRA Configurations with Energy Consumption Analysis
Summary
A systematic study investigates parameter-efficient fine-tuning (PEFT) using Low-Rank Adaptation (LoRA) on Qwen2.5-3B to develop a domain-specific conversational assistant for telecommunications customer support. The methodology introduces a combinatorial approach to synthetic dataset generation, leveraging a glossary of 52 industry-specific terms to produce approximately 30,000 training examples covering 1,560 distinct problem scenarios via a generative pipeline using Gemini 2.0 Flash. Researchers conducted a comprehensive empirical evaluation of 16 distinct LoRA configurations, systematically varying hyperparameters and target modules. Critically, the study extended traditional performance metrics to include energy consumption analysis (284-1371 Wh, a 5x variation) and qualitative evaluation using LLM-as-a-judge methodology with GPT-5.2 and Claude 4.5 Sonnet. Findings reveal a striking divergence: the fine-tuned configuration with the lowest validation loss (0.5024) ranks 6th-7th qualitatively, while the configuration with the highest validation loss (0.6807) ranks 1st by both human-aligned judges. This highlights the insufficiency of validation loss alone for selecting conversational AI models.
Key takeaway
For MLOps Engineers deploying domain-specific conversational AI, relying solely on validation loss for model selection is misleading. Your fine-tuning configuration with the lowest loss might not deliver the best perceived conversational quality. You should integrate LLM-as-a-judge evaluations (e.g., with GPT-5.2 or Claude 4.5 Sonnet) and energy consumption analysis into your selection pipeline. Prioritize configurations that balance qualitative performance and energy efficiency, like configuration 4 or 8, over those merely minimizing loss.
Key insights
Validation loss alone is insufficient for selecting conversational AI models; qualitative evaluation is crucial.
Principles
- Broader LoRA target module coverage reduces loss more than higher rank.
- Lower LoRA ranks (e.g., r=16) can outperform higher ranks (r=32).
- Fast convergence can reduce total training energy despite higher per-step cost.
Method
Generate synthetic data by factorizing domain knowledge (terms, causes, contexts) and expanding with an LLM (Gemini 2.0 Flash).
In practice
- Use r=16 for 3B-parameter model conversational fine-tuning.
- Target attention modules for simple datasets, add FFN for complex ones.
- Incorporate LLM-as-a-judge and energy measurements.
Topics
- Parameter-Efficient Fine-Tuning
- LoRA Configurations
- Telecommunications AI
- Synthetic Data Generation
- LLM-as-a-Judge
- Energy Efficiency
Code references
Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.