Generative AI-Based Virtual Assistant using Retrieval-Augmented Generation: An evaluation study for bachelor projects

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, extended

Summary

A virtual assistant (VA) leveraging Retrieval-Augmented Generation (RAG) has been developed and evaluated to support Maastricht University students with project-specific regulations. This VA integrates a multi-query mechanism and a self-reflection system to enhance accuracy and address challenges like hallucinations and context-specific responses in specialized domains. The system was tested with 64 DACS bachelor students on June 20-21, 2024, using eight scenarios. Evaluation metrics included Context Precision (88%), Context Recall (42%), Answer Relevancy (57%), and Faithfulness (43%), with an average response time of 10.045 seconds. While GPT-3.5 generally outperformed Gemini 1.0 Pro, student feedback indicated improved performance in most scenarios when using the VA, reducing "I don't know" responses and increasing correct answers, despite some exceptions.

Key takeaway

For research scientists developing domain-specific LLM applications, consider integrating a robust RAG system with multi-query retrieval and a self-reflection mechanism. This approach, demonstrated to improve accuracy and reduce uncertainty in student inquiries, can mitigate common LLM challenges like hallucinations and enhance contextual relevance. You should prioritize comprehensive evaluation frameworks like RAGAS and real-world user testing to refine system performance and identify areas for training data improvement.

Key insights

RAG-based virtual assistants with self-reflection can effectively address LLM limitations in specialized academic contexts.

Principles

Method

The VA pipeline involves retrieval (multi-query, vector database, RRF, reranker, few-shot), generation (low-temperature XML prompts), and self-reflection (question rewrite, hallucination check, answer check, clarification questions).

In practice

Topics

Code references

Best for: Research Scientist, NLP Engineer, AI Scientist, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.