Ablation Study of a Fairness Auditing Agentic System for Bias Mitigation in Early-Onset Colorectal Cancer Detection

2026-03-19 · Source: cs.MA updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Algorithmic Fairness · Depth: Advanced, extended

Summary

A study by Cedars-Sinai Medical Center researchers evaluated an agentic AI system designed to audit biomedical machine learning models for fairness in early-onset colorectal cancer (EO-CRC) detection. The system features a two-agent architecture: a Domain Expert Agent synthesizing literature on EO-CRC disparities and a Fairness Consultant Agent recommending sensitive attributes and fairness metrics. An ablation study compared three Ollama large language models (Llama 3.1 8B, GPT-OSS 20B, and GPT-OSS 120B parameters) across three configurations: pretrained LLM-only, Agent without Retrieval-Augmented Generation (RAG), and Agent with RAG. The Agent with RAG configuration consistently achieved the highest semantic similarity to expert-derived reference statements, particularly for disparity identification, suggesting that agentic systems with retrieval can enhance fairness auditing in clinical AI.

Key takeaway

For AI Scientists and Research Scientists developing or deploying clinical AI, you should prioritize agentic systems incorporating Retrieval-Augmented Generation (RAG) to enhance fairness auditing, especially for tasks requiring deep domain knowledge like disparity identification. This approach significantly improves the semantic alignment of audit recommendations with expert standards, helping to mitigate algorithmic bias in high-stakes applications like early-onset colorectal cancer detection. Consider the computational resources available, as smaller models like Llama 3.1 8B can be effective when paired with robust RAG.

Key insights

Agentic AI systems with RAG significantly improve fairness auditing in clinical AI by grounding responses in external knowledge.

Principles

RAG enhances LLM performance for knowledge-intensive tasks.
Agentic systems can automate complex, context-dependent auditing.
Model scale and task type influence RAG's effectiveness.

Method

A two-agent LLM system (Domain Expert, Fairness Consultant) with RAG was evaluated using semantic similarity against expert ground truth for fairness auditing in early-onset colorectal cancer.

In practice

Use RAG for LLM-driven clinical fairness auditing.
Consider smaller LLMs (e.g., Llama 3.1 8B) with RAG for resource-constrained settings.
Tailor agent architecture to task-specific knowledge needs.

Topics

Agentic AI Systems
Algorithmic Bias Mitigation
Retrieval-Augmented Generation
Clinical AI Fairness
Early-Onset Colorectal Cancer

Best for: AI Scientist, Research Scientist, AI Researcher, Machine Learning Engineer, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.MA updates on arXiv.org.