Agents
Forced Deferral: Manipulating Routing Decisions in Multimodal LLM Cascades
The paper introduces the Forced Deferral Attack (FDA), an adversarial technique that exploits the routing mechanism in multimodal large language model (MLLM) cascades by manipulating the weak model's confidence to defer queries to a stronger model. The FDA employs a temperature-flattened objective to create a universal border trigger, effectively increasing the routing to the strong model across various datasets while outperforming traditional image perturbation and prompt injection methods. This highlights a significant security vulnerability in MLLM cascades, emphasizing the need for robust defenses against adversarial manipulation of compute allocation in AI systems.
multimodalattackrouting