ResearcharXiv cs.CL — 8 d ago

FineDialFact: A benchmark for Fine-grained Dialogue Fact Verification

The article introduces FineDialFact, a benchmark for fine-grained dialogue fact verification aimed at addressing hallucinations in large language models. It presents a dataset constructed from existing dialogue datasets and evaluates various baseline methods, finding that Chain-of-Thought reasoning improves performance, achieving a maximum F1-score of 0.74 on the HybriDialogue dataset. This benchmark is significant for practitioners as it provides a structured approach to assess and improve the factual consistency of dialogue systems, highlighting ongoing challenges in the field.

llmdialogueverificationbenchmarkrelevance 0.00 · engagement 0.00

Read at source ↗← all news