Research
FineDialFact: A benchmark for Fine-grained Dialogue Fact Verification
The article introduces FineDialFact, a benchmark for fine-grained dialogue fact verification aimed at addressing hallucinations in large language models. It presents a dataset constructed from existing dialogue datasets and evaluates various baseline methods, finding that Chain-of-Thought reasoning improves performance, achieving a maximum F1-score of 0.74 on the HybriDialogue dataset. This benchmark is significant for practitioners as it provides a structured approach to assess and improve the factual consistency of dialogue systems, highlighting ongoing challenges in the field.
llmdialogueverificationbenchmark