ViolenceNet: Dense Multi-Head Self-Attention with Bidirectional Convolutional LSTM for Detecting Violence

被引:34
|
作者
Rendon-Segador, Fernando J. [1 ]
Alvarez-Garcia, Juan A. [1 ]
Enriquez, Fernando [1 ]
Deniz, Oscar [2 ]
机构
[1] Univ Seville, Dept Lenguajes Sist Informat, Seville 41012, Spain
[2] Univ Castilla La Mancha, VISILAB ETSII, Ciudad Real 13071, Spain
关键词
violence detection; fight detection; deep learning; dense net; bidirectional ConvLSTM; VIDEO; SURVEILLANCE; RECOGNITION; FRAMEWORK;
D O I
10.3390/electronics10131601
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Introducing efficient automatic violence detection in video surveillance or audiovisual content monitoring systems would greatly facilitate the work of closed-circuit television (CCTV) operators, rating agencies or those in charge of monitoring social network content. In this paper we present a new deep learning architecture, using an adapted version of DenseNet for three dimensions, a multi-head self-attention layer and a bidirectional convolutional long short-term memory (LSTM) module, that allows encoding relevant spatio-temporal features, to determine whether a video is violent or not. Furthermore, an ablation study of the input frames, comparing dense optical flow and adjacent frames subtraction and the influence of the attention layer is carried out, showing that the combination of optical flow and the attention mechanism improves results up to 4.4%. The conducted experiments using four of the most widely used datasets for this problem, matching or exceeding in some cases the results of the state of the art, reducing the number of network parameters needed (4.5 millions), and increasing its efficiency in test accuracy (from 95.6% on the most complex dataset to 100% on the simplest one) and inference time (less than 0.3 s for the longest clips). Finally, to check if the generated model is able to generalize violence, a cross-dataset analysis is performed, which shows the complexity of this approach: using three datasets to train and testing on the remaining one the accuracy drops in the worst case to 70.08% and in the best case to 81.51%, which points to future work oriented towards anomaly detection in new datasets.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Convolutional multi-head self-attention on memory for aspect sentiment classification
    Zhang, Yaojie
    Xu, Bing
    Zhao, Tiejun
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2020, 7 (04) : 1038 - 1044
  • [2] Convolutional Multi-Head Self-Attention on Memory for Aspect Sentiment Classification
    Yaojie Zhang
    Bing Xu
    Tiejun Zhao
    IEEE/CAAJournalofAutomaticaSinica, 2020, 7 (04) : 1038 - 1044
  • [3] The sentiment analysis model with multi-head self-attention and Tree-LSTM
    Li Lei
    Pei Yijian
    Jin Chenyang
    SIXTH INTERNATIONAL WORKSHOP ON PATTERN RECOGNITION, 2021, 11913
  • [4] Adaptive Pruning for Multi-Head Self-Attention
    Messaoud, Walid
    Trabelsi, Rim
    Cabani, Adnane
    Abdelkefi, Fatma
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2023, PT II, 2023, 14126 : 48 - 57
  • [5] Riding feeling recognition based on multi-head self-attention LSTM for driverless automobile
    Tang, Xianzhi
    Xie, Yongjia
    Li, Xinlong
    Wang, Bo
    PATTERN RECOGNITION, 2025, 159
  • [6] A malicious network traffic detection model based on bidirectional temporal convolutional network with multi-head self-attention mechanism
    Cai, Saihua
    Xu, Han
    Liu, Mingjie
    Chen, Zhilin
    Zhang, Guofeng
    COMPUTERS & SECURITY, 2024, 136
  • [7] An adaptive multi-head self-attention coupled with attention filtered LSTM for advanced scene text recognition
    Selvam, Prabu
    Kumar, S. N.
    Kannadhasan, S.
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2025,
  • [8] Detecting and Extracting of Adverse Drug Reaction Mentioning Tweets with Multi-Head Self-Attention
    Ge, Suyu
    Qi, Tao
    Wu, Chuhan
    Huang, Yongfeng
    SOCIAL MEDIA MINING FOR HEALTH APPLICATIONS (#SMM4H) WORKSHOP & SHARED TASK, 2019, : 96 - 98
  • [9] CNN-MHSA: A Convolutional Neural Network and multi-head self-attention combined approach for detecting phishing websites
    Xiao, Xi
    Zhang, Dianyan
    Hu, Guangwu
    Jiang, Yong
    Xia, Shutao
    NEURAL NETWORKS, 2020, 125 : 303 - 312
  • [10] Neural News Recommendation with Multi-Head Self-Attention
    Wu, Chuhan
    Wu, Fangzhao
    Ge, Suyu
    Qi, Tao
    Huang, Yongfeng
    Xie, Xing
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 6389 - 6394