Agents
COGNITION: From Evaluation to Defense against Multimodal LLM CAPTCHA Solvers
This paper evaluates the vulnerability of visual CAPTCHAs to attacks by multimodal large language models (MLLMs), analyzing seven models across 18 CAPTCHA types in terms of accuracy, latency, and cost. The findings indicate that while MLLMs can solve simpler recognition tasks with high success rates, they struggle with more complex tasks requiring spatial reasoning. The authors propose defense strategies based on their analysis, demonstrating that structural changes to CAPTCHAs can significantly reduce MLLM success rates, highlighting the need for urgent redesigns to maintain security against evolving AI capabilities.
captchamllmsecurity