Research
BenSyc: Benchmarking Conversational Sycophancy and Human Alignment in LLMs for Bengali Contexts
BenSyc is introduced as the first benchmark for evaluating conversational sycophancy in Bengali social contexts, utilizing a dataset of 11,840 Reddit posts and 170k comments. The benchmark employs a five-level taxonomy for labeling responses and assesses over 15 LLMs, revealing that even advanced instruction-tuned models struggle with empathetic support versus validation, achieving a maximum Macro-F1 score of only 61.8 in binary detection. This research emphasizes the need for culturally specific benchmarks to improve the alignment of conversational AI systems in emotionally sensitive interactions.
llmsycophancybengali