Agents
SkillMoV: Mixture-of-View Routing with Prototype-Conditioned Gating for Unified Multi-View Proficiency Estimation
SkillMoV is a new framework for multi-scenario proficiency estimation from synchronized multi-view video, utilizing a Mixture-of-View Projector (MoVP) that incorporates a mixture-of-experts approach tailored to camera-specific features. The model achieves an overall accuracy of 50.17% on the EgoExo4D dataset, outperforming previous state-of-the-art methods by 3.57 percentage points, while maintaining a parameter-efficient design through LoRA adaptation, training only 23.32% of its parameters. This advancement is significant for practitioners as it enhances skill assessment across diverse domains and camera setups, addressing the limitations of existing multi-view aggregation techniques.
multi-viewproficiency estimation