LEARNING SOUND LOCALIZATION BETTER FROM SEMANTICALLY SIMILAR SAMPLES

被引：11

作者：

Senocak, Arda ^{[1
]}

Ryu, Hyeonggon ^{[1
]}

Kim, Junsik ^{[2
]}

Kweon, In So ^{[1
]}

机构：

[1] Korea Adv Inst Sci & Technol, Daejeon, South Korea

[2] Harvard Univ, Cambridge, MA 02138 USA

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

基金：

新加坡国家研究基金会;

关键词：

audio-visual learning; audio-visual sound localization; audio-visual correspondence; self-supervised;

D O I：

10.1109/ICASSP43922.2022.9747867

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The objective of this work is to localize the sound sources in visual scenes. Existing audio-visual works employ contrastive learning by assigning corresponding audio-visual pairs from the same source as positives while randomly mismatched pairs as negatives. However, these negative pairs may contain semantically matched audio-visual information. Thus, these semantically correlated pairs, "hard positives", are mistakenly grouped as negatives. Our key contribution is showing that hard positives can give similar response maps to the corresponding pairs. Our approach incorporates these hard positives by adding their response maps into a contrastive learning objective directly. We demonstrate the effectiveness of our approach on VGG-SS and SoundNet-Flickr test sets, showing favorable performance to the state-of-the-art methods.

引用

页码：4863 / 4867

页数：5

共 50 条

[1] Accurate learning of word usage: Differentiating semantically similar words
Adachi, T
FOREIGN LANGUAGE ANNALS, 2003, 36 (02) : 267 - 278
[2] HTR-P II: Learning thematic relations from semantically sound sentences
Rosa, JLG
2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: E-SYSTEMS AND E-MAN FOR CYBERNETICS IN CYBERSPACE, 2002, : 488 - 493
[3] Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation
Chen, Ziyang
Qian, Shengyi
Owens, Andrew
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 7863 - 7874
[4] Sensorimotor Learning of Sound Localization from an Auditory Evoked Behavior
Bernard, Mathieu
Pirim, Patrick
de Cheveigne, Alain
Gas, Bruno
2012 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2012, : 91 - 96
[5] Phylogenetic analysis of near-duplicate and semantically-similar images using viewpoint localization
Milani, Simone
Bestagini, Paolo
Tubaro, Stefano
2016 8TH IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY (WIFS 2016), 2016,
[6] Intelligent System for Semantically Similar Sentences Identification and Generation Based on Machine Learning Methods
Zdebskyi, Petro
Lytvyn, Vasyl
Burov, Yevhen
Rybchak, Zoriana
Kravets, Petro
Lozynska, Olga
Holoshchuk, Roman
Kubinska, Solomiya
Dmytriv, Alina
COMPUTATIONAL LINGUISTICS AND INTELLIGENT SYSTEMS (COLINS 2020), VOL I: MAIN CONFERENCE, 2020, 2604
[7] Learning classifiers from semantically heterogeneous data
Caragea, D
Pathak, J
Honavar, VG
ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2004: COOPLS, DOA, AND ODBASE, PT 2, PROCEEDINGS, 2004, 3291 : 963 - 980
[8] SIMILAR SOUND SEPARATION AND CUMULATIVE INTRODUCTION IN LEARNING LETTER-SOUND CORRESPONDENCES
CARNINE, DW
JOURNAL OF EDUCATIONAL RESEARCH, 1976, 69 (10): : 368 - 372
[9] Deep-Learning-Assisted Sound Source Localization From a Flying Drone
Wang, Lin
Cavallaro, Andrea
IEEE SENSORS JOURNAL, 2022, 22 (21) : 20828 - 20838
[10] Identifying Networks of Semantically-Similar Individuals from Public Discussion Forums
Danowski, James A.
2010 INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2010), 2010, : 144 - 151

← 1 2 3 4 5 →