PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts

2026-05-15 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, extended

Summary

Parameter-Efficient Multi-task Learning (PEML) is a novel framework designed to efficiently fine-tune Large Language Models (LLMs) for multiple tasks simultaneously. It addresses limitations of existing PEFT methods like LoRA and Prefix Tuning, which are primarily single-task oriented or lack robust multi-task adaptation. PEML integrates a neural architecture search method, PrefixNAS, for optimizing continuous prompts with low-rank adaptation (LoRA) for model weights. This co-optimization strategy allows for dynamic adjustment of prefix structures and efficient model adaptation, reducing memory usage and computational overhead compared to deploying individual models or task-specific adapters. Evaluated against state-of-the-art multi-task learning methods on benchmarks like GLUE, SuperGLUE, and MMLU, PEML demonstrates an average accuracy improvement of up to 6.67%, with peak gains of up to 10.75% on individual tasks. The framework also shows superior VRAM efficiency and reduced inference latency due to its unified adapter design.

Key takeaway

For AI Engineers and Research Scientists building multi-task LLM systems, PEML offers a compelling approach to enhance performance and resource efficiency. Its integrated prompt optimization and model adaptation can yield significant accuracy gains and reduce VRAM consumption and inference latency compared to traditional PEFT methods. You should explore PEML for deploying a single LLM across diverse tasks, especially where resource consolidation and dynamic task adaptation are critical.

Key insights

PEML co-optimizes continuous prompts via neural architecture search and model weights via LoRA for efficient multi-task LLM fine-tuning.

Principles

Co-optimize prompt and model adaptation for multi-task efficiency.
Unified adapters reduce inference latency and VRAM usage.
Neural architecture search can discover robust, generalizable prefix structures.

Method

PEML jointly optimizes LoRA parameters and PrefixNAS architecture parameters using a gradient-based neural architecture search. It constructs shared prefixes and applies LoRA to model weights, minimizing a combined task loss and architectural regularization.

In practice

Consider PEML for multi-task LLM deployments to consolidate resources.
Evaluate PrefixNAS for dynamic prompt optimization in complex scenarios.
Prioritize parallel optimization over sequential for LoRA and Prefix Tuning.

Topics

Parameter-Efficient Fine-Tuning
Multi-Task Learning
Large Language Models
Low-Rank Adaptation
Continuous Prompts

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.