AgentsarXiv cs.AI — 11 d ago

TERMS-Bench: Diagnosing LLM Negotiation Agents Beyond Deal Rate

The article introduces Terms-Bench, a Bayesian-game framework designed for evaluating negotiation capabilities of language models (LLMs) by providing a structured environment that acts as a verifier. It focuses on bilateral price negotiation, allowing for detailed diagnostics of LLM performance beyond simple deal rates, revealing agent-specific weaknesses such as surplus extraction and belief calibration. This framework enhances the understanding of LLM negotiation behaviors, enabling practitioners to identify and address specific shortcomings in agent design and performance.

negotiationevaluationLLMrelevance 0.00 · engagement 0.00

Read at source ↗← all news