Agents
HyDRA: Hybrid Dynamic Routing Architecture for Heterogeneous LLM Pools
HyDRA (Hybrid Dynamic Routing Architecture) introduces a novel framework for optimizing heterogeneous LLM pools by predicting fine-grained capability requirements per query and matching them to model profiles through a shortfall-matching algorithm. Utilizing a ModernBERT encoder with four independent sigmoid heads, HyDRA achieves a median CPU inference latency of 86 ms in production and allows for dynamic model configuration changes without retraining. The system demonstrates significant cost savings, achieving up to 72.5% reduction while maintaining competitive quality against established models like Claude Sonnet 4.6, making it a valuable tool for practitioners managing diverse LLM deployments.
routingmodel-poolsllm