LEARNING SOUND LOCALIZATION BETTER FROM SEMANTICALLY SIMILAR SAMPLES

被引:11
|
作者
Senocak, Arda [1 ]
Ryu, Hyeonggon [1 ]
Kim, Junsik [2 ]
Kweon, In So [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Daejeon, South Korea
[2] Harvard Univ, Cambridge, MA 02138 USA
基金
新加坡国家研究基金会;
关键词
audio-visual learning; audio-visual sound localization; audio-visual correspondence; self-supervised;
D O I
10.1109/ICASSP43922.2022.9747867
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The objective of this work is to localize the sound sources in visual scenes. Existing audio-visual works employ contrastive learning by assigning corresponding audio-visual pairs from the same source as positives while randomly mismatched pairs as negatives. However, these negative pairs may contain semantically matched audio-visual information. Thus, these semantically correlated pairs, "hard positives", are mistakenly grouped as negatives. Our key contribution is showing that hard positives can give similar response maps to the corresponding pairs. Our approach incorporates these hard positives by adding their response maps into a contrastive learning objective directly. We demonstrate the effectiveness of our approach on VGG-SS and SoundNet-Flickr test sets, showing favorable performance to the state-of-the-art methods.
引用
收藏
页码:4863 / 4867
页数:5
相关论文
共 50 条
  • [1] Accurate learning of word usage: Differentiating semantically similar words
    Adachi, T
    FOREIGN LANGUAGE ANNALS, 2003, 36 (02) : 267 - 278
  • [2] HTR-P II: Learning thematic relations from semantically sound sentences
    Rosa, JLG
    2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: E-SYSTEMS AND E-MAN FOR CYBERNETICS IN CYBERSPACE, 2002, : 488 - 493
  • [3] Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation
    Chen, Ziyang
    Qian, Shengyi
    Owens, Andrew
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 7863 - 7874
  • [4] Sensorimotor Learning of Sound Localization from an Auditory Evoked Behavior
    Bernard, Mathieu
    Pirim, Patrick
    de Cheveigne, Alain
    Gas, Bruno
    2012 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2012, : 91 - 96
  • [5] Phylogenetic analysis of near-duplicate and semantically-similar images using viewpoint localization
    Milani, Simone
    Bestagini, Paolo
    Tubaro, Stefano
    2016 8TH IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY (WIFS 2016), 2016,
  • [6] Intelligent System for Semantically Similar Sentences Identification and Generation Based on Machine Learning Methods
    Zdebskyi, Petro
    Lytvyn, Vasyl
    Burov, Yevhen
    Rybchak, Zoriana
    Kravets, Petro
    Lozynska, Olga
    Holoshchuk, Roman
    Kubinska, Solomiya
    Dmytriv, Alina
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT SYSTEMS (COLINS 2020), VOL I: MAIN CONFERENCE, 2020, 2604
  • [7] Learning classifiers from semantically heterogeneous data
    Caragea, D
    Pathak, J
    Honavar, VG
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2004: COOPLS, DOA, AND ODBASE, PT 2, PROCEEDINGS, 2004, 3291 : 963 - 980
  • [8] SIMILAR SOUND SEPARATION AND CUMULATIVE INTRODUCTION IN LEARNING LETTER-SOUND CORRESPONDENCES
    CARNINE, DW
    JOURNAL OF EDUCATIONAL RESEARCH, 1976, 69 (10): : 368 - 372
  • [9] Deep-Learning-Assisted Sound Source Localization From a Flying Drone
    Wang, Lin
    Cavallaro, Andrea
    IEEE SENSORS JOURNAL, 2022, 22 (21) : 20828 - 20838
  • [10] Identifying Networks of Semantically-Similar Individuals from Public Discussion Forums
    Danowski, James A.
    2010 INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2010), 2010, : 144 - 151