The Dark Regulome: Disentangling Predictability from Regulation in Genomic Foundation Models
The article presents a study on the "dark regulome," focusing on how noncoding elements influence synaptogenic gene expression in gliomas using sequence foundation models, specifically Caduceus-Ph, HyenaDNA, and Enformer. The authors introduce a residualization-and-permutation diagnostic to differentiate between predictability-driven and regulation-driven variance in gene expression across 30,448 dark genome elements, revealing that while a 10kb proximal-regulatory horizon remains consistent, the element-class hierarchy derived from language models does not. This research provides a methodological tool that enhances the understanding of regulatory mechanisms in genomic studies, which is crucial for practitioners developing AI models in genomics.