LEARNING SOUND LOCALIZATION BETTER FROM SEMANTICALLY SIMILAR SAMPLES

被引:11
|
作者
Senocak, Arda [1 ]
Ryu, Hyeonggon [1 ]
Kim, Junsik [2 ]
Kweon, In So [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Daejeon, South Korea
[2] Harvard Univ, Cambridge, MA 02138 USA
基金
新加坡国家研究基金会;
关键词
audio-visual learning; audio-visual sound localization; audio-visual correspondence; self-supervised;
D O I
10.1109/ICASSP43922.2022.9747867
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The objective of this work is to localize the sound sources in visual scenes. Existing audio-visual works employ contrastive learning by assigning corresponding audio-visual pairs from the same source as positives while randomly mismatched pairs as negatives. However, these negative pairs may contain semantically matched audio-visual information. Thus, these semantically correlated pairs, "hard positives", are mistakenly grouped as negatives. Our key contribution is showing that hard positives can give similar response maps to the corresponding pairs. Our approach incorporates these hard positives by adding their response maps into a contrastive learning objective directly. We demonstrate the effectiveness of our approach on VGG-SS and SoundNet-Flickr test sets, showing favorable performance to the state-of-the-art methods.
引用
收藏
页码:4863 / 4867
页数:5
相关论文
共 50 条
  • [41] Sound absorption estimation of finite porous samples with deep residual learning
    Zea, Elias
    Brandao, Eric
    Nolan, Melanie
    Cuenca, Jacques
    Anden, Joakim
    Svensson, U. Peter
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 154 (04): : 2321 - 2332
  • [42] Self-Supervised Sound Promotion Method of Sound Localization from Video
    Li, Yang
    Zhao, Xiaoli
    Zhang, Zhuoyao
    ELECTRONICS, 2023, 12 (17)
  • [43] Sound source localization and speech enhancement with sparse Bayesian learning beamforming
    Xenaki, Angeliki
    Boldt, Jesper Bunsow
    Christensen, Mads Graesboll
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2018, 143 (06): : 3912 - 3921
  • [44] Learning to reason from samples
    Ben-Zvi, Dani
    Bakker, Arthur
    Makar, Katie
    EDUCATIONAL STUDIES IN MATHEMATICS, 2015, 88 (03) : 291 - 303
  • [45] Multitask Learning of Time-Frequency CNN for Sound Source Localization
    Pang, Cheng
    Liu, Hong
    Li, Xiaofei
    IEEE ACCESS, 2019, 7 : 40725 - 40737
  • [46] Toward learning robust contrastive embeddings for binaural sound source localization
    Tang, Duowei
    Taseska, Maja
    van Waterschoot, Toon
    FRONTIERS IN NEUROINFORMATICS, 2022, 16
  • [47] Learning to reason from samples
    Dani Ben-Zvi
    Arthur Bakker
    Katie Makar
    Educational Studies in Mathematics, 2015, 88 : 291 - 303
  • [48] A weighted MVDR beamformer based on SVM learning for sound source localization
    Salvati, Daniele
    Drioli, Carlo
    Foresti, Gian Luca
    PATTERN RECOGNITION LETTERS, 2016, 84 : 15 - 21
  • [49] QuadCOINS-Network: A Deep Learning Approach to Sound Source Localization
    Ciccia, Simone
    Scionti, Alberto
    Vitali, Giacomo
    Terzo, Olivier
    COMPLEX, INTELLIGENT AND SOFTWARE INTENSIVE SYSTEMS, 2021, 1194 : 130 - 141
  • [50] Sound localization through evolutionary learning applied to spiking neural networks
    Poulsen, Thomas M.
    Moore, Roger K.
    2007 IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTATIONAL INTELLIGENCE, VOLS 1 AND 2, 2007, : 350 - +