Research
Automated Creativity Evaluation of Language Models Across Open-Ended Tasks
This article presents a novel automated framework for evaluating the creativity of large language models (LLMs) across open-ended tasks, addressing the limitations of existing task-specific metrics. The framework employs semantic entropy for measuring divergent creativity and a retrieval-based multi-agent judge for assessing convergent creativity, achieving over 60% efficiency improvement in context-sensitive evaluations. By validating the framework across diverse domains and demonstrating its effectiveness in capturing creativity facets, this work establishes a scalable and reproducible standard for creativity assessment, which is crucial for advancing creative applications in AI.
creativityevaluationllm