Generative AI-Based Virtual Assistant using Retrieval-Augmented Generation: An evaluation study for bachelor projects

2024-06-01 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, extended

Summary

A virtual assistant (VA) leveraging Retrieval-Augmented Generation (RAG) has been developed and evaluated to support Maastricht University students with project-specific regulations. This VA integrates a multi-query mechanism and a self-reflection system to enhance accuracy and address challenges like hallucinations and context-specific responses in specialized domains. The system was tested with 64 DACS bachelor students on June 20-21, 2024, using eight scenarios. Evaluation metrics included Context Precision (88%), Context Recall (42%), Answer Relevancy (57%), and Faithfulness (43%), with an average response time of 10.045 seconds. While GPT-3.5 generally outperformed Gemini 1.0 Pro, student feedback indicated improved performance in most scenarios when using the VA, reducing "I don't know" responses and increasing correct answers, despite some exceptions.

Key takeaway

For research scientists developing domain-specific LLM applications, consider integrating a robust RAG system with multi-query retrieval and a self-reflection mechanism. This approach, demonstrated to improve accuracy and reduce uncertainty in student inquiries, can mitigate common LLM challenges like hallucinations and enhance contextual relevance. You should prioritize comprehensive evaluation frameworks like RAGAS and real-world user testing to refine system performance and identify areas for training data improvement.

Key insights

RAG-based virtual assistants with self-reflection can effectively address LLM limitations in specialized academic contexts.

Principles

Multi-query retrieval improves document relevance.
Self-reflection mechanisms reduce LLM hallucinations.
Low temperature and structured prompts enhance answer accuracy.

Method

The VA pipeline involves retrieval (multi-query, vector database, RRF, reranker, few-shot), generation (low-temperature XML prompts), and self-reflection (question rewrite, hallucination check, answer check, clarification questions).

In practice

Use RAG to integrate domain-specific knowledge.
Implement multi-query retrieval for diverse query perspectives.
Employ self-reflection for hallucination detection and correction.

Topics

Generative AI
Retrieval-Augmented Generation
Virtual Assistant
Large Language Models
Academic Assistance

Code references

DikaVer/maastricht_university_generative_virtual_assistant

Best for: Research Scientist, NLP Engineer, AI Scientist, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.