SciHorizon-GENE: Benchmarking LLM for Life Sciences Inference from Gene Knowledge to Functional Understanding
SciHorizon-GENE is a newly introduced large-scale benchmark designed to evaluate large language models (LLMs) on gene-centric reasoning tasks, utilizing curated knowledge from over 190,000 human genes and comprising more than 540,000 questions. The benchmark assesses LLMs on critical aspects such as research attention sensitivity, hallucination tendencies, answer completeness, and literature influence, highlighting significant variability in model performance and persistent challenges in producing accurate biological interpretations. This resource is essential for practitioners in biomedical AI, as it provides a systematic framework for model evaluation and selection, directly impacting the reliability of LLMs in biological interpretation tasks.