Research
GENEB: Why Genomic Models Are Hard to Compare
GENEB is introduced as a large-scale diagnostic benchmark designed to evaluate frozen representations from 40 genomic foundation models across 100 tasks in 13 functional categories using a unified probing-based protocol. This benchmark addresses the challenges of fragmented evaluation practices by allowing controlled comparisons of model scale, architecture, tokenization, and pretraining data, revealing that model rankings can vary significantly across tasks and that architectural alignment often outweighs parameter count. For practitioners, GENEB provides a structured framework for more principled comparisons and informed model selection in genomic machine learning.
llmgenomic-modelsbenchmark