Multimodal
VigilFormer: Deformable Attention for Video Anomaly Detection with Causal Risk Inference
VigilFormer is a new framework for video anomaly detection that integrates deformable spatio-temporal attention with causal temporal modeling. It features a Deformable Spatio-Temporal Encoder (DSTE) that efficiently attends to key locations in video frames, and a Causal Anomaly Classifier (CAC) that utilizes dilated causal convolutions for classification without frame-level labels. Achieving AUC scores of 87.83%, 97.21%, and 89.74% on benchmark datasets while processing at 41.5 FPS on a single GPU, VigilFormer presents a significant advancement in balancing detection accuracy and real-time performance, making it valuable for practitioners in surveillance applications.
videoanomaly detectioncausal inference