Models
Kwai-Keye/Keye-VL-2.0-30B-A3B-GGUF · Hugging Face
Kwai-Keye has released Keye-VL-2.0-30B-A3B, a 30 billion parameter model designed for advanced long-video understanding and multimodal agent capabilities. It features a DSA-native long-context architecture that utilizes sparse attention for efficient processing of hour-long videos, and it leads benchmarks in video comprehension while offering robust post-training mechanisms to enhance reasoning and reduce hallucinations. This model is significant for practitioners as it integrates agent functionalities that support complex tasks like tool usage and web searches, marking a notable advancement in multimodal AI capabilities.
keyehugging_facevideo_understanding