Agents
TERMS-Bench: Diagnosing LLM Negotiation Agents Beyond Deal Rate
The article introduces Terms-Bench, a Bayesian-game framework designed for evaluating negotiation capabilities of language models (LLMs) by providing a structured environment that acts as a verifier. It focuses on bilateral price negotiation, allowing for detailed diagnostics of LLM performance beyond simple deal rates, revealing agent-specific weaknesses such as surplus extraction and belief calibration. This framework enhances the understanding of LLM negotiation behaviors, enabling practitioners to identify and address specific shortcomings in agent design and performance.
negotiationevaluationLLM