Ummay Maria Muna, Shanta Biswas, Syed Abu Ammar Muhammad Zarif, Philip Jefferson Deori, Tauseef Tajwar, and Dr. Swakkhar Shatabda
Automated Video Anomaly Detection (VAD) is a challenging task due to its context-dependent and sporadic nature. Recent deep learning advancements offer promising solutions. In this paper, we propose a spatio-temporal analysis-based video anomaly detection method where we address challenges such as lengthy videos and anomaly sparsity in an anomalous video by segmenting and labeling anomalous parts, integrating a sliding window system, and employing multilevel embedding creation techniques. We enhance feature representation using customized ResNet50 and introduce the parameter-efficient SRU++ recurrent model with an attention mechanism for the efficient processing of embedding sequences. Additionally, a cluster-based weighing mechanism was also incorporated to further enhance the prediction capability. Extensive evaluation utilizing different approaches on the UCF Crime dataset demonstrates our approach's superior performance compared to state-of-the-art methods, making it suitable for real-world surveillance scenarios.