Wisdom of Committee: Diverse Distillation from Large Foundation Models and Domain Experts
The paper introduces DiverseDistill, an interactive framework for knowledge distillation that enhances performance transfer from large foundation models to compact domain-specific models by utilizing a diverse committee of teachers, including domain experts. It operates with frozen teacher models, employing a learnable Question-Answer mechanism to align outputs without requiring co-training or architectural changes, achieving significant efficiency by reducing forward passes by approximately 30% without quality loss. Evaluation results indicate that DiverseDistill recovers 73-114% of the performance gap in tasks with up to 38x model compression in recommendation and 3.6x in vision tasks, making it a promising approach for practitioners aiming to improve distillation outcomes in resource-constrained environments.