ViolenceNet: Dense Multi-Head Self-Attention with Bidirectional Convolutional LSTM for Detecting Violence

被引:34
|
作者
Rendon-Segador, Fernando J. [1 ]
Alvarez-Garcia, Juan A. [1 ]
Enriquez, Fernando [1 ]
Deniz, Oscar [2 ]
机构
[1] Univ Seville, Dept Lenguajes Sist Informat, Seville 41012, Spain
[2] Univ Castilla La Mancha, VISILAB ETSII, Ciudad Real 13071, Spain
关键词
violence detection; fight detection; deep learning; dense net; bidirectional ConvLSTM; VIDEO; SURVEILLANCE; RECOGNITION; FRAMEWORK;
D O I
10.3390/electronics10131601
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Introducing efficient automatic violence detection in video surveillance or audiovisual content monitoring systems would greatly facilitate the work of closed-circuit television (CCTV) operators, rating agencies or those in charge of monitoring social network content. In this paper we present a new deep learning architecture, using an adapted version of DenseNet for three dimensions, a multi-head self-attention layer and a bidirectional convolutional long short-term memory (LSTM) module, that allows encoding relevant spatio-temporal features, to determine whether a video is violent or not. Furthermore, an ablation study of the input frames, comparing dense optical flow and adjacent frames subtraction and the influence of the attention layer is carried out, showing that the combination of optical flow and the attention mechanism improves results up to 4.4%. The conducted experiments using four of the most widely used datasets for this problem, matching or exceeding in some cases the results of the state of the art, reducing the number of network parameters needed (4.5 millions), and increasing its efficiency in test accuracy (from 95.6% on the most complex dataset to 100% on the simplest one) and inference time (less than 0.3 s for the longest clips). Finally, to check if the generated model is able to generalize violence, a cross-dataset analysis is performed, which shows the complexity of this approach: using three datasets to train and testing on the remaining one the accuracy drops in the worst case to 70.08% and in the best case to 81.51%, which points to future work oriented towards anomaly detection in new datasets.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Neural Linguistic Steganalysis via Multi-Head Self-Attention
    Jiao, Sai-Mei
    Wang, Hai-feng
    Zhang, Kun
    Hu, Ya-qi
    JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING, 2021, 2021 (2021)
  • [22] Multi-modal multi-head self-attention for medical VQA
    Vasudha Joshi
    Pabitra Mitra
    Supratik Bose
    Multimedia Tools and Applications, 2024, 83 : 42585 - 42608
  • [23] Personalized News Recommendation with CNN and Multi-Head Self-Attention
    Li, Aibin
    He, Tingnian
    Guo, Yi
    Li, Zhuoran
    Rong, Yixuan
    Liu, Guoqi
    2022 IEEE 13TH ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2022, : 102 - 108
  • [24] Personalized multi-head self-attention network for news recommendation
    Zheng, Cong
    Song, Yixuan
    NEURAL NETWORKS, 2025, 181
  • [25] Attention as Relation: Learning Supervised Multi-head Self-Attention for Relation Extraction
    Liu, Jie
    Chen, Shaowei
    Wang, Bingquan
    Zhang, Jiaxin
    Li, Na
    Xu, Tong
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3787 - 3793
  • [26] Multi-Head Self-Attention Gated-Dilated Convolutional Neural Network for Word Sense Disambiguation
    Zhang, Chun-Xiang
    Zhang, Yu-Long
    Gao, Xue-Yao
    IEEE ACCESS, 2023, 11 : 14202 - 14210
  • [27] Incorporating temporal multi-head self-attention convolutional networks and LightGBM for indoor air quality prediction
    Lu, Yifeng
    Wang, Jinyong
    Wang, Dongsheng
    Yoo, Changkyoo
    Liu, Hongbin
    APPLIED SOFT COMPUTING, 2024, 157
  • [28] SPEECH ENHANCEMENT USING SELF-ADAPTATION AND MULTI-HEAD SELF-ATTENTION
    Koizumi, Yuma
    Yatabe, Kohei
    Delcroix, Marc
    Masuyama, Yoshiki
    Takeuchi, Daiki
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 181 - 185
  • [29] Implementation and Application of Violence Detection System Based on Multi-head Attention and LSTM
    Cao, Fengping
    Miao, Yi
    Zhang, Wangyi
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT VII, ICIC 2024, 2024, 14868 : 77 - 88
  • [30] Arrhythmia classification algorithm based on multi-head self-attention mechanism
    Wang, Yue
    Yang, Guanci
    Li, Shaobo
    Li, Yang
    He, Ling
    Liu, Dan
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 79