Harnessing high-level concepts, visual, and auditory features for violence detection in videos

被引:11
|
作者
Peixoto, Bruno M. [1 ]
Lavi, Bahram [1 ]
Dias, Zanoni [1 ]
Rocha, Anderson [1 ]
机构
[1] Univ Estadual Campinas, Inst Comp, BR-13083852 Campinas, SP, Brazil
基金
巴西圣保罗研究基金会;
关键词
NETWORKS;
D O I
10.1016/j.jvcir.2021.103174
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In detecting sensitive media, violence is one of the hardest to define objectively, and thus, a significant challenge to detect automatically. While many studies were conducted in detecting aspects of violence, very few try to approach the general concept. We propose a method that aims to enable machines to understand a high-level concept of violence by first breaking it down into smaller, more objective ones, such as fights, explosions, blood, and gunshots, to combine them later, leading to a better understanding of the scene. For this, we leverage characteristics of each individual sub-concept of violence (relying upon custom-tailored convolutional neural networks) to guide how they should be described. A fight scene should incorporate temporal features that a scene with blood does not need to describe. A scene with explosions or gunshots should weigh more on its audio features. With this multimodal approach, we trained visual and auditory feature detectors and later combined them into a decision neural network to give us a violence detector that considers several different aspects of the problem. This robust and modular approach allows different cultures and users to adapt the detector to their specific needs.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Using Visual Context and Region Semantics for High-Level Concept Detection
    Mylonas, Phivos
    Spyrou, Evaggelos
    Avrithis, Yannis
    Kollias, Stefanos
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2009, 11 (02) : 229 - 243
  • [22] Mid-level visual features underlie the high-level categorical organization of the ventral stream
    Long, Bria
    Yu, Chen-Ping
    Konkle, Talia
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2018, 115 (38) : E9015 - E9024
  • [23] Salient Region Detection via Low-level Features and High-level Priors
    Lin, Mingqiang
    Chen, Zonghai
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2015, : 971 - 975
  • [24] Performance Analysis of Low-Level and High-Level Intuitive Features for Melanoma Detection
    Ashfaq, Muniba
    Minallah, Nasru
    Ullah, Zahid
    Ahmad, Arbab Masood
    Saeed, Aamir
    Hafeez, Abdul
    [J]. ELECTRONICS, 2019, 8 (06)
  • [25] A comparative study for multiple visual concepts detection in images and videos
    Abdelkader Hamadi
    Philippe Mulhem
    Georges Quénot
    [J]. Multimedia Tools and Applications, 2016, 75 : 8973 - 8997
  • [26] A comparative study for multiple visual concepts detection in images and videos
    Hamadi, Abdelkader
    Mulhem, Philippe
    Quenot, Georges
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (15) : 8973 - 8997
  • [27] High-level properties and visual experience
    William Fish
    [J]. Philosophical Studies, 2013, 162 : 43 - 55
  • [28] The geometry of high-level visual representations
    Kriegeskorte, Nikolaus
    [J]. I-PERCEPTION, 2014, 5 (04): : 412 - 412
  • [29] High-level properties and visual experience
    Fish, William
    [J]. PHILOSOPHICAL STUDIES, 2013, 162 (01) : 43 - 55
  • [30] VIOLENCE DETECTION IN VIDEOS BASED ON FUSING VISUAL AND AUDIO INFORMATION
    Pang, Wen-Feng
    He, Qian-Hua
    Hu, Yong-jian
    Li, Yan-Xiong
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2260 - 2264