Proactive Conversational Assistant for a Procedural Manual Task based on Audio and IMU
Summary
A real-time conversational assistant has been developed that guides users through procedural manual tasks, such as furniture assembly, using only audio and Inertial Measurement Unit (IMU) data from a wearable device. This approach significantly reduces computational costs and enhances user privacy compared to video-based systems. Researchers from Qualcomm Technologies, Inc. created a dataset of 600 conversations and introduced a novel User Whim Agnostic (UWA) LoRA finetuning method for language models like Qwen2.5-1.5B-Instruct and Qwen2.5-3B-Instruct. This finetuning achieved over 30% F-score improvement and a 16x inference speedup by enabling the model to suppress uninformative dialogues while retaining critical instructions. The system operates entirely on edge devices, utilizing Snapdragon W5 Gen 1 and Dragonwing IQ9 processors, with components like Whisper-medium for ASR and MeloTTS-English for TTS.
Key takeaway
For Machine Learning Engineers developing real-time, privacy-sensitive conversational AI, you should explore non-video modalities like audio and IMU from wearables. Implementing the User Whim Agnostic (UWA) LoRA finetuning method can significantly improve model efficiency and user experience by reducing uninformative dialogue and speeding up inference by 16x. Consider edge deployment on Qualcomm processors to ensure low-latency, cloud-independent operation, enhancing both privacy and responsiveness for procedural task assistants.
Key insights
A privacy-preserving, edge-deployed conversational assistant uses audio/IMU and UWA LoRA finetuning for proactive, efficient task guidance.
Principles
- Lightweight modalities enhance privacy and reduce compute.
- Finetuning improves conversational restraint and instruction quality.
- Edge deployment enables real-time, cloud-independent assistance.
Method
The system captures audio/IMU from a smartwatch, recognizes activities, transcribes user speech, and feeds recent dialogues to a LoRA-finetuned language model. A rule-based step tracker provides context and suggests messages, with UWA finetuning optimizing proactive instruction delivery.
In practice
- Use UWA LoRA to reduce LLM verbosity in task guidance.
- Deploy activity recognition and LLM on edge for privacy.
- Generate synthetic conversation data with LLMs for training.
Topics
- Conversational AI
- Edge AI
- Wearable Devices
- LoRA Finetuning
- Multimodal Sensing
- Procedural Task Guidance
Code references
Best for: NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Hardware Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.