SafetyarXiv cs.AI — 8 d ago

AgentFairBench: Do LLM Agents Discriminate When They Act?

AgentFairBench is a newly introduced benchmark for assessing demographic disparity in the actions of large language model (LLM) agents across hiring, lending, and medical triage domains. It utilizes a structured approach with synthetic, demographic-neutral profiles and various agent scaffolds, enabling practitioners to measure disparities through metrics like counterfactual flip rate and action-rate disparity. This benchmark is significant for AI developers as it provides a reproducible and cost-effective tool to evaluate fairness in LLM decision-making, fostering responsible AI deployment in sensitive applications.

fairnessLLMbenchmarkagentsrelevance 0.00 · engagement 0.00

Read at source ↗← all news