Research
Design Methodology and Performance Trade-offs Management for Distributed and Compound AI Systems
The article presents a design methodology for Compound AI systems that shifts from a model-centric to a system-centric approach, allowing for orchestration of multiple models and algorithms to meet service-level objectives (SLOs) like accuracy, latency, and cost. The proposed methodology organizes design choices across workflow topology and configuration selection, identifying eight design patterns that address limitations of monolithic models. Validation through case studies shows that these configurations can achieve accuracy within 2.5 to 4 percentage points of monolithic models while reducing latency by up to 60% and costs by up to 71%, highlighting the need for automated systems to manage the growing complexity of design space.
distributed-aiperformance-trade-offssystem-design