ai-digest.dev
last updated 5 h ago
SafetyarXiv cs.AI 21 h ago

PhantomBench: Benchmarking the Non-existential Threat of Language Models

PhantomBench, a new benchmark introduced in arXiv:2606.11105v1, evaluates the hallucination rates of 21 language models across diverse domains using over 60,000 non-existent terms and entities. The benchmark reveals alarmingly high hallucination rates, with averages reaching 86.7%, highlighting the inability of even advanced models to recognize non-existent concepts. This tool not only aids in assessing model behavior regarding rare concepts but also provides a scalable pipeline for generating tailored non-existent concepts, which is crucial for practitioners aiming to mitigate risks associated with model hallucinations.

hallucinationslanguage modelsbenchmarkingrelevance 0.00 · engagement 0.00
Read at source ↗← all news