AgentsarXiv cs.AI — 12 d ago

Divide, Deliberate, Decide: A Multi-Agent Framework for Fine-Grained Egocentric Action Recognition

The paper presents the "Divide, Deliberate, Decide" framework for fine-grained egocentric action recognition using a multi-agent approach that operates fully locally and in a zero-shot manner. It employs a Vision-Language Model (VLM) orchestrator to segment videos and propose candidate labels, which are then refined through deliberation among diverse VLM specialists, culminating in a Borda count for ranking. This method enhances zero-shot performance by leveraging the diversity in model priors without requiring fine-tuning, making it significant for practitioners aiming to improve action recognition in nuanced visual contexts.

action recognitionmulti-agentrelevance 0.00 · engagement 0.00

Read at source ↗← all news