Agents
Your AI Travel Agent Would Book You a Bullfight: An Agentic Benchmark for Implicit Animal Welfare in Frontier AI Models
The article introduces the TAC (Travel Agent Compassion) benchmark, which evaluates AI agents' ability to avoid options involving animal exploitation in travel booking scenarios. The benchmark assesses seven frontier models, revealing that all perform below the chance level of 64%, with Claude Opus 4.7 achieving the highest score of 53%. Notably, incorporating a welfare-aware prompt significantly improves performance in Claude and GPT-5.5, highlighting the challenges in aligning AI decision-making with ethical considerations in real-world applications.
animal welfareagentsbenchmarkAI