ResearcharXiv cs.CL — 2 d ago

BenSyc: Benchmarking Conversational Sycophancy and Human Alignment in LLMs for Bengali Contexts

BenSyc is introduced as the first benchmark for evaluating conversational sycophancy in Bengali social contexts, utilizing a dataset of 11,840 Reddit posts and 170k comments. The benchmark employs a five-level taxonomy for labeling responses and assesses over 15 LLMs, revealing that even advanced instruction-tuned models struggle with empathetic support versus validation, achieving a maximum Macro-F1 score of only 61.8 in binary detection. This research emphasizes the need for culturally specific benchmarks to improve the alignment of conversational AI systems in emotionally sensitive interactions.

llmsycophancybengalirelevance 0.00 · engagement 0.00

Read at source ↗← all news