LEARNING SOUND LOCALIZATION BETTER FROM SEMANTICALLY SIMILAR SAMPLES

被引:11
|
作者
Senocak, Arda [1 ]
Ryu, Hyeonggon [1 ]
Kim, Junsik [2 ]
Kweon, In So [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Daejeon, South Korea
[2] Harvard Univ, Cambridge, MA 02138 USA
基金
新加坡国家研究基金会;
关键词
audio-visual learning; audio-visual sound localization; audio-visual correspondence; self-supervised;
D O I
10.1109/ICASSP43922.2022.9747867
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The objective of this work is to localize the sound sources in visual scenes. Existing audio-visual works employ contrastive learning by assigning corresponding audio-visual pairs from the same source as positives while randomly mismatched pairs as negatives. However, these negative pairs may contain semantically matched audio-visual information. Thus, these semantically correlated pairs, "hard positives", are mistakenly grouped as negatives. Our key contribution is showing that hard positives can give similar response maps to the corresponding pairs. Our approach incorporates these hard positives by adding their response maps into a contrastive learning objective directly. We demonstrate the effectiveness of our approach on VGG-SS and SoundNet-Flickr test sets, showing favorable performance to the state-of-the-art methods.
引用
收藏
页码:4863 / 4867
页数:5
相关论文
共 50 条
  • [21] Mining Better Samples for Contrastive Learning of Temporal Correspondence
    Jeon, Sangryul
    Min, Dongbo
    Kim, Seungryong
    Sohn, Kwanghoon
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1034 - 1044
  • [22] Probabilistic Structure from Sound and Probabilistic Sound Source Localization
    Lin, Chi-Hao
    Wang, Chieh-Chih
    2008 IEEE WORKSHOP ON ADVANCED ROBOTICS AND ITS SOCIAL IMPACTS, 2008, : 31 - 36
  • [23] Self-supervised Learning from Semantically Imprecise Data
    Brust, Clemens-Alexander
    Barz, Bjoern
    Denzler, Joachim
    PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2022, : 27 - 35
  • [24] Early Word Learning: How Infants Learn Words that Sound Similar
    Wales, Julia
    Hollich, George
    PROCEEDINGS OF THE TWENTY-SIXTH ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY, 2004, : 1652 - 1652
  • [25] Earlier Intervention Leads to Better Sound Localization in Children with Bilateral Cochlear Implants
    Van Deun, Lieselot
    van Wieringen, Astrid
    Scherf, Fanny
    Deggouj, Naima
    Desloovere, Christian
    Offeciers, F. Erwin
    Van de Heyning, Paul H.
    Dhooge, Ingeborg J.
    Wouters, Jan
    AUDIOLOGY AND NEURO-OTOLOGY, 2010, 15 (01) : 7 - 17
  • [26] Identification of Semantically Similar Sentences in Clinical Notes: Iterative Intermediate Training Using Multi-Task Learning
    Mahajan, Diwakar
    Poddar, Ananya
    Liang, Jennifer J.
    Lin, Yen-Ting
    Prager, John M.
    Suryanarayanan, Parthasarathy
    Raghavan, Preethi
    Tsou, Ching-Huei
    JMIR MEDICAL INFORMATICS, 2020, 8 (11)
  • [27] Listening in to the tundra: Sound samples from the Arctic
    Soukup, K
    LIBERTE, 2003, 45 (04): : 76 - 80
  • [28] SSLIDE: SOUND SOURCE LOCALIZATION FOR INDOORS BASED ON DEEP LEARNING
    Wu, Yifan
    Ayyalasomayajula, Roshan
    Bianco, Michael J.
    Bharadia, Dinesh
    Gerstoft, Peter
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4680 - 4684
  • [29] Phased microphone array for sound source localization with deep learning
    Ma W.
    Liu X.
    Aerospace Systems, 2019, 2 (2) : 71 - 81
  • [30] Learning Multiple Sound Source 2D Localization
    Le Moing, Guillaume
    Vinayavekhin, Phongtharin
    Inoue, Tadanobu
    Vongkulbhisal, Jayakorn
    Munawar, Asim
    Tachibana, Ryuki
    Agravante, Don Joven
    2019 IEEE 21ST INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP 2019), 2019,