PEFT of SLM for Telecommunications Customer Support: A Comparative Study of LoRA Configurations with Energy Consumption Analysis

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

A systematic study investigates parameter-efficient fine-tuning (PEFT) using Low-Rank Adaptation (LoRA) on Qwen2.5-3B to develop a domain-specific conversational assistant for telecommunications customer support. The methodology introduces a combinatorial approach to synthetic dataset generation, leveraging a glossary of 52 industry-specific terms to produce approximately 30,000 training examples covering 1,560 distinct problem scenarios via a generative pipeline using Gemini 2.0 Flash. Researchers conducted a comprehensive empirical evaluation of 16 distinct LoRA configurations, systematically varying hyperparameters and target modules. Critically, the study extended traditional performance metrics to include energy consumption analysis (284-1371 Wh, a 5x variation) and qualitative evaluation using LLM-as-a-judge methodology with GPT-5.2 and Claude 4.5 Sonnet. Findings reveal a striking divergence: the fine-tuned configuration with the lowest validation loss (0.5024) ranks 6th-7th qualitatively, while the configuration with the highest validation loss (0.6807) ranks 1st by both human-aligned judges. This highlights the insufficiency of validation loss alone for selecting conversational AI models.

Key takeaway

For MLOps Engineers deploying domain-specific conversational AI, relying solely on validation loss for model selection is misleading. Your fine-tuning configuration with the lowest loss might not deliver the best perceived conversational quality. You should integrate LLM-as-a-judge evaluations (e.g., with GPT-5.2 or Claude 4.5 Sonnet) and energy consumption analysis into your selection pipeline. Prioritize configurations that balance qualitative performance and energy efficiency, like configuration 4 or 8, over those merely minimizing loss.

Key insights

Validation loss alone is insufficient for selecting conversational AI models; qualitative evaluation is crucial.

Principles

Method

Generate synthetic data by factorizing domain knowledge (terms, causes, contexts) and expanding with an LLM (Gemini 2.0 Flash).

In practice

Topics

Code references

Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.