Harnessing high-level concepts, visual, and auditory features for violence detection in videos

被引:11
|
作者
Peixoto, Bruno M. [1 ]
Lavi, Bahram [1 ]
Dias, Zanoni [1 ]
Rocha, Anderson [1 ]
机构
[1] Univ Estadual Campinas, Inst Comp, BR-13083852 Campinas, SP, Brazil
基金
巴西圣保罗研究基金会;
关键词
NETWORKS;
D O I
10.1016/j.jvcir.2021.103174
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In detecting sensitive media, violence is one of the hardest to define objectively, and thus, a significant challenge to detect automatically. While many studies were conducted in detecting aspects of violence, very few try to approach the general concept. We propose a method that aims to enable machines to understand a high-level concept of violence by first breaking it down into smaller, more objective ones, such as fights, explosions, blood, and gunshots, to combine them later, leading to a better understanding of the scene. For this, we leverage characteristics of each individual sub-concept of violence (relying upon custom-tailored convolutional neural networks) to guide how they should be described. A fight scene should incorporate temporal features that a scene with blood does not need to describe. A scene with explosions or gunshots should weigh more on its audio features. With this multimodal approach, we trained visual and auditory feature detectors and later combined them into a decision neural network to give us a violence detector that considers several different aspects of the problem. This robust and modular approach allows different cultures and users to adapt the detector to their specific needs.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] High-level Features for Multimodal Deception Detection in Videos
    Rill-Garcia, Rodrigo
    Jair Escalante, Hugo
    Villasenor-Pineda, Luis
    Reyes-Meza, Veronica
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 1565 - 1573
  • [2] Detection of Violence in Cartoon Videos Using Visual Features
    Khalil, Tahira
    Bangash, Javed Iqbal
    Khan, Abdul Waheed
    Lashari, Saima Anwar
    Khan, Abdullah
    Ramli, Dzati Athiar
    [J]. KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KSE 2021), 2021, 192 : 4962 - 4971
  • [3] GRADUAL CHROMA REDUCTION AND HIGH-LEVEL VISUAL MASKING IN VIDEOS
    Harvey, Jim
    Moan, Steven Le
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 151 - 155
  • [4] Auditory motion direction encoding in auditory cortex and high-level visual cortex
    Alink, Arjen
    Euler, Felix
    Kriegeskorte, Nikolaus
    Singer, Wolf
    Kohler, Axel
    [J]. HUMAN BRAIN MAPPING, 2012, 33 (04) : 969 - 978
  • [5] High-Level Visual Features for Underwater Place Recognition
    Li, Jie
    Eustice, Ryan M.
    Johnson-Roberson, Matthew
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2015, : 3652 - 3659
  • [6] Combination of high-level features with low-level features for detection of pedestrian
    Takarli, Fariba
    Aghagolzadeh, Ali
    Seyedarabi, Hadi
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2016, 10 (01) : 93 - 101
  • [7] Combination of high-level features with low-level features for detection of pedestrian
    Fariba Takarli
    Ali Aghagolzadeh
    Hadi Seyedarabi
    [J]. Signal, Image and Video Processing, 2016, 10 : 93 - 101
  • [8] Low level Visio-Temporal Features for Violence Detection in Cartoon Videos
    Khalil, Tahira
    Bangash, Javed Iqbal
    Abdusalam
    Adnan, Awais
    [J]. 2016 SIXTH INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING TECHNOLOGY (INTECH), 2016, : 320 - 325
  • [9] Object Detection by Estimating and Combining High-Level Features
    Levine, Geoffrey
    DeJong, Gerald
    [J]. IMAGE ANALYSIS AND PROCESSING - ICIAP 2009, PROCEEDINGS, 2009, 5716 : 161 - 169
  • [10] Recognizing high-level audio-visual concepts using context
    Naphade, MR
    Huang, TS
    [J]. 2001 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL III, PROCEEDINGS, 2001, : 46 - 49