On GATE, Text and Social Media Analysis, and Detecting Misinformation Online
Summary
A Master's thesis project explored the application of Retrieval-Augmented Generation (RAG) systems for newsroom environments, aiming to enhance journalistic integrity and traceability of AI-assisted work. The researcher, Tasos Galanopoulos, visited the University of Sheffield's GATE team from April 7-17, 2026, to develop and test a configurable RAG system. This system, built with Streamlit, ChromaDB, and open-access LLMs like Mistral and DeepSeek, allows journalists to interactively configure parameters and evaluate outputs. Experiments involved four diverse journalistic datasets (economic reports, political interviews, newspaper editorials, central bank reports) and four distinct response styles (Strict RAG, Journalistic Style, Analysis & Key Points, Archivist). Performance was measured using Faithfulness, Answer Relevance, Context Precision, and Ground Truth Similarity, revealing that dataset structure significantly impacts RAG performance more than parameter tuning.
Key takeaway
For AI Scientists developing tools for newsrooms, recognize that a "one-size-fits-all" RAG assistant is not viable. You should prioritize designing adaptive RAG systems that dynamically adjust retrieval and generation parameters based on the specific dataset characteristics and journalistic task requirements to ensure both grounding and relevance.
Key insights
RAG systems offer traceable AI assistance for journalism, but performance heavily depends on dataset characteristics, not just parameter tuning.
Principles
- Dataset structure dictates RAG performance.
- Narrative text suits standard RAG pipelines.
- Adaptive RAG systems are essential for diverse domains.
Method
A RAG application was developed using Streamlit, ChromaDB, and open-access LLMs. It allows users to upload documents, configure retrieval/generation parameters, and run quantitative assessments using embedding-based metrics.
In practice
- Use RAG for source-grounded journalistic content.
- Tailor RAG configurations to document types.
- Consider adaptive RAG for varied newsroom tasks.
Topics
- Retrieval-Augmented Generation
- Newsroom AI Applications
- Large Language Models
- RAG System Evaluation
- Journalistic Datasets
Code references
Best for: AI Scientist, AI Student, NLP Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by On GATE, Text and Social Media Analysis, and Detecting Misinformation Online.