CRED-1: Building an Open Domain Credibility Dataset for Misinformation Pre-Bunking

2026-03-21 · Source: Machine Learning on Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Intermediate, short

Summary

CRED-1 is an open, multi-signal credibility dataset designed for pre-bunking online misinformation, developed as part of doctoral research at Frankfurt University of Applied Sciences. Unlike traditional binary credibility lists, CRED-1 assigns a composite score from 0.0 to 1.0 to 2,672 domains by integrating signals such as source labels from OpenSources.co and Iffy.news, domain age, web popularity via Tranco Top-1M, Google Fact Check frequency, and Google Safe Browsing threat intelligence. This approach addresses the limitations of static, opaque, single-label systems by providing nuanced, verifiable, and configurable credibility assessments. The dataset is compact, enabling on-device, privacy-preserving lookups for applications like browser extensions, and is fully reproducible using Python scripts.

Key takeaway

For AI Scientists and Research Scientists developing tools to combat online misinformation, CRED-1 offers a robust, open-source dataset for pre-bunking. You should consider integrating its multi-signal scoring approach into your models or applications to provide more nuanced and privacy-preserving credibility assessments, moving beyond simplistic binary labels. The dataset's reproducibility and compact format facilitate easy adoption and customization for various research and deployment scenarios.

Key insights

Multi-signal credibility scoring offers a nuanced, dynamic, and transparent alternative to binary domain labeling for misinformation pre-bunking.

Principles

Credibility is multi-faceted, not binary.
Pre-bunking is more effective than debunking.
On-device processing enhances privacy.

Method

CRED-1 computes a composite credibility score by weighting and combining independently verifiable signals like domain age, web popularity, fact-check frequency, and threat intelligence, allowing for configurable thresholds.

In practice

Embed compact JSON in mobile apps.
Integrate into browser extensions for warnings.
Use for content moderation pipelines.

Topics

CRED-1 Dataset
Multi-Signal Scoring
Misinformation Pre-Bunking
Domain Credibility
Online Misinformation

Code references

Best for: AI Scientist, Research Scientist, AI Researcher, Data Scientist, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning on Medium.