ResearcharXiv cs.CL — 11 d ago

A Large-Scale Multi-Dimensional Empirical Study of LLMs for Conversation Summarization

The paper presents OmniCSEval, a comprehensive benchmark for evaluating conversation summarization using 1,800 conversations across six scenarios with context lengths from 128 to 32,000 tokens. It introduces a bidirectional fact-checking framework for assessing summary completeness, conciseness, and faithfulness, and evaluates 28 LLMs categorized by reasoning capability and model size. This study highlights ongoing challenges in LLM performance across different scenarios and offers insights for practitioners on model selection for real-world applications.

conversationsummarizationbenchmarkrelevance 0.00 · engagement 0.00

Read at source ↗← all news