Learning weakly supervised audio-visual violence detection in hyperbolic space

被引：0

作者：

Zhou, Xiao ^{[1
]}

Peng, Xiaogang ^{[1
]}

Wen, Hao ^{[2
]}

Luo, Yikai ^{[1
]}

Yu, Keyang ^{[1
]}

Yang, Ping ^{[1
]}

Wu, Zizhao ^{[1
]}

机构：

[1] Hangzhou Dianzi Univ, Sch Digital Media & Technol, Hangzhou, Peoples R China

[2] Natl Univ Def Technol, Coll Elect Sci & Technol, Changsha, Peoples R China

来源：

IMAGE AND VISION COMPUTING | 2024年 / 151卷

关键词：

Weakly supervised learning; Hyperbolic space; Video violence detection;

D O I：

10.1016/j.imavis.2024.105286

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, the task of weakly supervised audio-visual violence detection has gained considerable attention. The goal of this task is to identify violent segments within multimodal data based on video-level labels. Despite advances in this field, traditional Euclidean neural networks, which have been used in prior research, encounter difficulties in capturing highly discriminative representations due to limitations of the feature space. To overcome this, we propose HyperVD, , a novel framework that learns snippet embeddings in hyperbolic space to improve model discrimination. We contribute two branches of fully hyperbolic graph convolutional networks that excavate feature similarities and temporal relationships among snippets in hyperbolic space. By learning snippet representations in this space, the framework effectively learns semantic discrepancies between violent snippets and normal ones. Extensive experiments on the XD-Violence benchmark demonstrate that our method achieves 85.67% AP, outperforming the state-of-the-art methods by a sizable margin.

引用

页数：10

共 50 条

[21] Self-Supervised Video Forensics by Audio-Visual Anomaly Detection
Feng, Chao
Chen, Ziyang
Owens, Andrew
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10491 - 10503
[22] Self-supervised object detection from audio-visual correspondence
Afouras, Triantafyllos
Asano, Yuki M.
Fagan, Francois
Vedaldi, Andrea
Metze, Florian
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10565 - 10576
[23] Learning Self-supervised Audio-Visual Representations for Sound Recommendations
Krishnamurthy, Sudha
ADVANCES IN VISUAL COMPUTING (ISVC 2021), PT II, 2021, 13018 : 124 - 138
[24] Audio-Visual Paths to Learning
McClusky, F. D.
EDUCATION, 1947, 68 (03): : 190 - 190
[25] AUDIO-VISUAL AIDS TO LEARNING
不详
BMJ-BRITISH MEDICAL JOURNAL, 1966, 2 (5521): : 1023 - +
[26] DOA-Aware Audio-Visual Self-Supervised Learning for Sound Event Localization and Detection
Fujita, Yoto
Bando, Yoshiaki
Imoto, Keisuke
Onishi, Masaki
Yoshii, Kazuyoshi
2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 2061 - 2067
[27] Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing
Cheng, Haoyue
Liu, Zhaoyang
Zhou, Hang
Qian, Chen
Wu, Wayne
Wang, Limin
COMPUTER VISION, ECCV 2022, PT XXXIV, 2022, 13694 : 431 - 448
[28] DHHN: Dual Hierarchical Hybrid Network for Weakly-Supervised Audio-Visual Video Parsing
Jiang, Xun
Xu, Xing
Chen, Zhiguo
Zhang, Jingran
Song, Jingkuan
Shen, Fumin
Lu, Huimin
Shen, Heng Tao
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
[29] LEARNING CONTEXTUALLY FUSED AUDIO-VISUAL REPRESENTATIONS FOR AUDIO-VISUAL SPEECH RECOGNITION
Zhang, Zi-Qiang
Zhang, Jie
Zhang, Jian-Shu
Wu, Ming-Hui
Fang, Xin
Dai, Li-Rong
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1346 - 1350
[30] Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser
Lai, Yung-Hsuan
Chen, Yen-Chun
Wang, Yu-Chiang Frank
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,

← 1 2 3 4 5 →