CRED-1: Building an Open Domain Credibility Dataset for Misinformation Pre-Bunking
Summary
CRED-1 is an open, multi-signal credibility dataset designed for pre-bunking online misinformation, developed as part of doctoral research at Frankfurt University of Applied Sciences. Unlike traditional binary credibility lists, CRED-1 assigns a composite score from 0.0 to 1.0 to 2,672 domains by integrating signals such as source labels from OpenSources.co and Iffy.news, domain age, web popularity via Tranco Top-1M, Google Fact Check frequency, and Google Safe Browsing threat intelligence. This approach addresses the limitations of static, opaque, single-label systems by providing nuanced, verifiable, and configurable credibility assessments. The dataset is compact, enabling on-device, privacy-preserving lookups for applications like browser extensions, and is fully reproducible using Python scripts.
Key takeaway
For AI Scientists and Research Scientists developing tools to combat online misinformation, CRED-1 offers a robust, open-source dataset for pre-bunking. You should consider integrating its multi-signal scoring approach into your models or applications to provide more nuanced and privacy-preserving credibility assessments, moving beyond simplistic binary labels. The dataset's reproducibility and compact format facilitate easy adoption and customization for various research and deployment scenarios.
Key insights
Multi-signal credibility scoring offers a nuanced, dynamic, and transparent alternative to binary domain labeling for misinformation pre-bunking.
Principles
- Credibility is multi-faceted, not binary.
- Pre-bunking is more effective than debunking.
- On-device processing enhances privacy.
Method
CRED-1 computes a composite credibility score by weighting and combining independently verifiable signals like domain age, web popularity, fact-check frequency, and threat intelligence, allowing for configurable thresholds.
In practice
- Embed compact JSON in mobile apps.
- Integrate into browser extensions for warnings.
- Use for content moderation pipelines.
Topics
- CRED-1 Dataset
- Multi-Signal Scoring
- Misinformation Pre-Bunking
- Domain Credibility
- Online Misinformation
Code references
Best for: AI Scientist, Research Scientist, AI Researcher, Data Scientist, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning on Medium.