Research
Adversarial Concept Search: Predicting Compositional Errors From Feature Geometry
The paper introduces a method for predicting compositional errors in large language models (LLMs) by analyzing their representational geometry. It demonstrates that models struggle with concept combinations when their linear encodings are close, leading to interference, while near-orthogonal encodings facilitate successful composition. This approach enables the identification of high-risk scenarios for LLMs, potentially enhancing stress testing and active learning strategies in practical applications.
llmcompositional errorsadversarial search