Agents
EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving Agents
EEVEE is a novel multi-dataset test-time prompt learning framework designed for LLM agents that addresses the challenges of heterogeneous input streams in real-world applications. It features a router for task clustering and a co-evolution strategy for router and prompt optimization, resulting in significant performance improvements; specifically, EEVEE achieves average multi-benchmark score enhancements of 10.38 and 24.32 points over Qwen3-4B-Instruct and DeepSeek-V3.2, respectively, and outperforms SOTA methods GEPA and ACE by up to 37.2% and 48.2%. This framework is crucial for practitioners aiming to build robust AI systems capable of adapting to diverse and dynamic task environments.
test-time learningprompt learningllm