Generative AI-Based Virtual Assistant using Retrieval-Augmented Generation: An evaluation study for bachelor projects
Summary
A virtual assistant (VA) leveraging Retrieval-Augmented Generation (RAG) has been developed and evaluated to support Maastricht University students with project-specific regulations. This VA integrates a multi-query mechanism and a self-reflection system to enhance accuracy and address challenges like hallucinations and context-specific responses in specialized domains. The system was tested with 64 DACS bachelor students on June 20-21, 2024, using eight scenarios. Evaluation metrics included Context Precision (88%), Context Recall (42%), Answer Relevancy (57%), and Faithfulness (43%), with an average response time of 10.045 seconds. While GPT-3.5 generally outperformed Gemini 1.0 Pro, student feedback indicated improved performance in most scenarios when using the VA, reducing "I don't know" responses and increasing correct answers, despite some exceptions.
Key takeaway
For research scientists developing domain-specific LLM applications, consider integrating a robust RAG system with multi-query retrieval and a self-reflection mechanism. This approach, demonstrated to improve accuracy and reduce uncertainty in student inquiries, can mitigate common LLM challenges like hallucinations and enhance contextual relevance. You should prioritize comprehensive evaluation frameworks like RAGAS and real-world user testing to refine system performance and identify areas for training data improvement.
Key insights
RAG-based virtual assistants with self-reflection can effectively address LLM limitations in specialized academic contexts.
Principles
- Multi-query retrieval improves document relevance.
- Self-reflection mechanisms reduce LLM hallucinations.
- Low temperature and structured prompts enhance answer accuracy.
Method
The VA pipeline involves retrieval (multi-query, vector database, RRF, reranker, few-shot), generation (low-temperature XML prompts), and self-reflection (question rewrite, hallucination check, answer check, clarification questions).
In practice
- Use RAG to integrate domain-specific knowledge.
- Implement multi-query retrieval for diverse query perspectives.
- Employ self-reflection for hallucination detection and correction.
Topics
- Generative AI
- Retrieval-Augmented Generation
- Virtual Assistant
- Large Language Models
- Academic Assistance
Code references
Best for: Research Scientist, NLP Engineer, AI Scientist, AI Student
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.