The Dark Regulome: Disentangling Predictability from Regulation in Genomic Foundation Models

· Source: cs.CL updates on arXiv.org · Field: Science & Research — Life Sciences & Biology, Health & Medical Research, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

This study introduces a novel residualization-and-permutation diagnostic to accurately interpret genomic foundation model outputs for the "dark regulome" in high-grade gliomas. Applied to Caduceus-Ph, HyenaDNA, and Enformer across 30,448 dark genome elements at 92 glioma-relevant loci, the diagnostic disentangles sequence predictability from true regulatory influence. It reveals a consistent 10 kb proximal-regulatory horizon and a clear architectural split: language models (Caduceus-Ph, HyenaDNA) share a predictability layer ranking long transposable elements, while Enformer uniquely identifies a regulatory-output layer of short proximal cCREs, with zero top-100 overlap. The analysis also confirms a 3.3x enrichment (p_emp<5x10^-3) for brain cis-eQTLs in top-100 elements, providing validated synaptogenic-locus candidates.

Key takeaway

For research scientists interpreting genomic foundation model outputs, you must apply the residualization-and-permutation diagnostic to avoid misinterpreting sequence predictability as true regulatory function. This diagnostic helps you identify genuinely regulatory elements, especially those within the robust 10 kb proximal-regulatory horizon. Focus your experimental validation efforts on candidates that show regulatory-output layer signals, like those uniquely identified by Enformer, to ensure biological relevance.

Key insights

A diagnostic separates sequence predictability from true regulatory influence in genomic foundation models, revealing distinct functional layers.

Principles

Method

The residualization-and-permutation diagnostic separates predictability-driven from regulation-driven RIS variance by controlling for nuisance covariates (k-mer entropy, GC content, log element length, log TSS distance) and evaluating against permutation nulls.

In practice

Topics

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.