Research
GhazalBench: Evaluating LLM Understanding and Canonical Surface-Form Access in Persian Ghazals
GhazalBench is a newly introduced benchmark designed to evaluate large language models (LLMs) on their understanding and access to the canonical surface forms of Persian ghazals. It assesses two key abilities: poem-to-prose understanding and surface-form access, revealing that while models capture poetic meaning, they struggle with exact verse completions in open-ended contexts. The findings underscore the necessity for evaluation frameworks that consider both semantic understanding and culturally significant text access, which is critical for practitioners developing LLMs for culturally nuanced applications.
llmpersian poetrybenchmark