Learning weakly supervised audio-visual violence detection in hyperbolic space

被引：0

作者：

Zhou, Xiao ^{[1
]}

Peng, Xiaogang ^{[1
]}

Wen, Hao ^{[2
]}

Luo, Yikai ^{[1
]}

Yu, Keyang ^{[1
]}

Yang, Ping ^{[1
]}

Wu, Zizhao ^{[1
]}

机构：

[1] Hangzhou Dianzi Univ, Sch Digital Media & Technol, Hangzhou, Peoples R China

[2] Natl Univ Def Technol, Coll Elect Sci & Technol, Changsha, Peoples R China

来源：

IMAGE AND VISION COMPUTING | 2024年 / 151卷

关键词：

Weakly supervised learning; Hyperbolic space; Video violence detection;

D O I：

10.1016/j.imavis.2024.105286

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, the task of weakly supervised audio-visual violence detection has gained considerable attention. The goal of this task is to identify violent segments within multimodal data based on video-level labels. Despite advances in this field, traditional Euclidean neural networks, which have been used in prior research, encounter difficulties in capturing highly discriminative representations due to limitations of the feature space. To overcome this, we propose HyperVD, , a novel framework that learns snippet embeddings in hyperbolic space to improve model discrimination. We contribute two branches of fully hyperbolic graph convolutional networks that excavate feature similarities and temporal relationships among snippets in hyperbolic space. By learning snippet representations in this space, the framework effectively learns semantic discrepancies between violent snippets and normal ones. Extensive experiments on the XD-Violence benchmark demonstrate that our method achieves 85.67% AP, outperforming the state-of-the-art methods by a sizable margin.

引用

页数：10

共 50 条

[11] Audio-visual self-supervised representation learning: A survey
Alsuwat, Manal
Al-Shareef, Sarah
Alghamdi, Manal
NEUROCOMPUTING, 2025, 634
[12] SELF-SUPERVISED LEARNING FOR AUDIO-VISUAL SPEAKER DIARIZATION
Ding, Yifan
Xu, Yong
Zhang, Shi-Xiong
Cong, Yahuan
Wang, Liqiang
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4367 - 4371
[13] Audio-Visual Predictive Coding for Self-Supervised Visual Representation Learning
Tellamekala, Mani Kumar
Valstar, Michel
Pound, Michael
Giesbrecht, Timo
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9912 - 9919
[14] Noise-Tolerant Self-Supervised Learning for Audio-Visual Voice Activity Detection
Kim, Ui-Hyun
INTERSPEECH 2021, 2021, : 326 - 330
[15] Comparing Learning Methodologies for Self-Supervised Audio-Visual Representation Learning
Terbouche, Hacene
Schoneveld, Liam
Benson, Oisin
Othmani, Alice
IEEE ACCESS, 2022, 10 : 41622 - 41638
[16] Weakly supervised video anomaly detection based on hyperbolic space
Qi, Meilin
Wu, Yuanyuan
SCIENTIFIC REPORTS, 2024, 14 (01):
[17] Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective
Fan, Yingying
Wu, Yu
Du, Bo
Lin, Yutian
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[18] Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective
Fan, Yingying
Wu, Yu
Du, Bo
Lin, Yutian
Advances in Neural Information Processing Systems, 2023, 36
[19] Enhancing Audio-Visual Association with Self-Supervised Curriculum Learning
Zhang, Jingran
Xu, Xing
Shen, Fumin
Lu, Huimin
Lu, Xin
Shen, Heng Tao
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 3351 - 3359
[20] SELF-SUPERVISED CONTRASTIVE LEARNING FOR AUDIO-VISUAL ACTION RECOGNITION
Liu, Yang
Tan, Ying
Lan, Haoyuan
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1000 - 1004

← 1 2 3 4 5 →