A Proposal-based Paradigm for Self-supervised Sound Source Localization in Videos

被引：10

作者：

Xuan, Hanyu ^{[1
]}

Wu, Zhiliang ^{[1
]}

Yang, Jian ^{[1
]}

Yan, Yan ^{[2
]}

Alameda-Pineda, Xavier ^{[3
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing, Peoples R China

[2] IIT, Dept Comp Sci, Chicago, IL 60616 USA

[3] Univ Grenoble Alpes, LJK, Grenoble INP, INRIA,CNRS, F-38000 Grenoble, France

来源：

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年

关键词：

D O I：

10.1109/CVPR52688.2022.00110

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Humans can easily recognize where and how the sound is produced via watching a scene and listening to corresponding audio cues. To achieve such cross-modal perception on machines, existing methods only use the maps generated by interpolation operations to localize the sound source.As semantic object-level localization is more attractive for potential practical applications, we argue that these existing map-based approaches only provide a coarse-grained and indirect description of the sound source. In this paper,we advocate a novel proposal-based paradigm that can directly perform semantic object-level localization, without any manual annotations. We incorporate the global response map as an unsupervised spatial constraint to weight the proposals according to how well they cover the estimated global shape of the sound source. As a result, our proposal-based sound source localization can be cast into a simpler Multiple Instance Learning (MIL) problem by filtering those instances corresponding to large sound-unrelated regions. Our method achieves state-of-the-art (SOTA) performance when compared to several baselines on multiple datasets.

引用

页码：1019 / 1028

页数：10

共 50 条

[1] Robust Audio-Visual Contrastive Learning for Proposal-Based Self-Supervised Sound Source Localization in Videos
Xuan, Hanyu
Wu, Zhiliang
Yang, Jian
Jiang, Bo
Luo, Lei
Alameda-Pineda, Xavier
Yan, Yan
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (07) : 4896 - 4907
[2] Hear The Flow: Optical Flow-Based Self-Supervised Visual Sound Source Localization
Fedorishin, Dennis
Mohan, Deen Dayal
Jawade, Bhavin
Setlur, Srirangaraj
Govindaraju, Venu
[J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2277 - 2286
[3] Self-Supervised Incremental Learning for Sound Source Localization in Complex Indoor Environment
Liu, Hangxin
Zhang, Zeyu
Zhu, Yixin
Zhu, Song-Chun
[J]. 2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 2599 - 2605
[4] StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos
Dvornik, Nikita
Hadji, Isma
Zhang, Ran
Derpanis, Konstantinos G.
Wildes, Richard P.
Jepson, Allan D.
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18952 - 18961
[5] Sound Localization by Self-supervised Time Delay Estimation
Chen, Ziyang
Fouhey, David F.
Owens, Andrew
[J]. COMPUTER VISION, ECCV 2022, PT XXVI, 2022, 13686 : 489 - 508
[6] Sound Localization by Self-supervised Time Delay Estimation
Chen, Ziyang
Fouhey, David F.
Owens, Andrew
[J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2022, 13686 LNCS : 489 - 508
[7] Visually Guided Sound Source Separation and Localization using Self-Supervised Motion Representations
Zhu, Lingyu
Rahtu, Esa
[J]. 2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2171 - 2181
[8] Self-Supervised Sound Promotion Method of Sound Localization from Video
Li, Yang
Zhao, Xiaoli
Zhang, Zhuoyao
[J]. ELECTRONICS, 2023, 12 (17)
[9] Self-supervised Underwater Source Localization based on Contrastive Predictive Coding
Zhu, Xiaoyu
Dong, Hefeng
Rossi, Pierluigi Salvo
Landro, Martin
[J]. 2021 IEEE SENSORS, 2021,
[10] How does Layer Normalization improve Batch Normalization in self-supervised sound source localization?
Liu, Tianyu
Zhang, Peng
Huang, Wei
Zha, Yufei
You, Tao
Zhang, Yanning
[J]. NEUROCOMPUTING, 2024, 567

← 1 2 3 4 5 →