Research
LLM Jaggedness Unlocks Scientific Creativity
The paper introduces SciAidanBench, a benchmark for assessing the scientific creativity of large language models (LLMs) by measuring their ability to generate unique and coherent responses to open-ended scientific questions. Evaluations of 19 base models across 30 variants reveal a phenomenon termed "jaggedness," characterized by uneven performance in creativity across tasks, prompts, and domains. This jaggedness is proposed as a resource for enhancing model performance through techniques like inference-time compute and knowledge pooling, suggesting that understanding these variability patterns can lead to the development of more effective meta-model ensembles for scientific idea generation.
creativityllmbenchmark