ai-digest.dev
last updated 2 h ago
AgentsarXiv cs.AI 11 d ago

Your AI Travel Agent Would Book You a Bullfight: An Agentic Benchmark for Implicit Animal Welfare in Frontier AI Models

The article introduces the TAC (Travel Agent Compassion) benchmark, which evaluates AI agents' ability to avoid options involving animal exploitation in travel booking scenarios. The benchmark assesses seven frontier models, revealing that all perform below the chance level of 64%, with Claude Opus 4.7 achieving the highest score of 53%. Notably, incorporating a welfare-aware prompt significantly improves performance in Claude and GPT-5.5, highlighting the challenges in aligning AI decision-making with ethical considerations in real-world applications.

animal welfareagentsbenchmarkAIrelevance 0.00 · engagement 0.00
Read at source ↗← all news
Your AI Travel Agent Would Book You a Bullfight: An Agentic Benchmark for Implicit Animal Welfare in Frontier AI Models — AI News Digest