ResearcharXiv cs.AI — 21 h ago

AMEL: Accumulated Message Effects on LLM Judgments

The paper introduces the concept of Accumulated Message Effects on LLM Judgments (AMEL), demonstrating that the polarity of prior conversation history significantly biases subsequent evaluations made by large language models (LLMs). Analyzing 84,088 API calls across 12 models from five providers, the study finds that models shift their judgments toward the prevailing sentiment of prior messages, with a notable negativity asymmetry where negative histories induce greater bias (1.52x more) than positive ones. This research highlights the importance of context management in evaluation pipelines, suggesting that using fresh contexts per item or balancing the history can mitigate bias in automated evaluations.

llmbiasjudgmentsevaluationrelevance 0.00 · engagement 0.00

Read at source ↗← all news