See the Sound, Hear the Pixels

被引:0
|
作者
Ramaswamy, Janani [1 ]
Das, Sukhendu [1 ]
机构
[1] IIT Madras, Dept Comp Sci & Engn, Visualizat & Percept Lab, Madras, Tamil Nadu, India
关键词
SEPARATING STYLE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For every event occurring in the real world, most often a sound is associated with the corresponding visual scene. Humans possess an inherent ability to automatically map the audio content with visual scenes leading to an effortless and enhanced understanding of the underlying event. This triggers an interesting question: Can this natural correspondence between video and audio, which has been diminutively explored so far, be learned by a machine and modeled jointly to localize the sound source in a visual scene? In this paper, we propose a novel algorithm that addresses the problem of localizing sound source in unconstrained videos, which uses efficient fusion and attention mechanisms. Two novel blocks namely, Audio Visual Fusion Block (AVFB) and Segment-Wise Attention Block (SWAB) have been developed for this purpose. Quantitative and qualitative evaluations show that it is feasible to use the same algorithm with minor modifications to serve the purpose of sound localization using three different types of learning: supervised, weakly supervised and unsupervised. A novel Audio Visual Triplet Gram Matrix Loss (AVTGML) has been proposed as a loss function to learn the localization in an unsupervised way. Our empirical evaluations demonstrate a significant increase in performance over the existing state-of-the-art methods, serving as a testimony to the superiority of our proposed approach.
引用
收藏
页码:2959 / 2968
页数:10
相关论文
共 50 条
  • [41] Hear to see - See to hear: a Smart Home System User Interface for visually or hearing-impaired people
    Ciabattoni, L.
    Ferracuti, F.
    Foresi, G.
    Monteriu, A.
    2018 IEEE 8TH INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - BERLIN (ICCE-BERLIN), 2018,
  • [42] To Hear or Not to Hear: Sound Availability Modulates Sensory-Motor Integration
    Camponogara, Ivan
    Turchet, Luca
    Carner, Marco
    Marchioni, Daniele
    Cesari, Paola
    FRONTIERS IN NEUROSCIENCE, 2016, 10
  • [43] Criminal procedure - See no evil, hear no evil
    Cole, D
    NATION, 2000, 271 (10) : 30 - 31
  • [44] Neither hear or see, what's happening?
    Perez-Chacon, P.
    Lara-Sanchez, H.
    Montero-Moreno, J. A.
    Hernandez-Herrero, M.
    EUROPEAN ANNALS OF OTORHINOLARYNGOLOGY-HEAD AND NECK DISEASES, 2024, 141 (04) : 251 - 252
  • [45] Can sea cucumber hear sound?
    Ichihashi, Kazuyoshi
    Amakawa, Taisaku
    Motokawa, Tatsuo
    Sanagawa, Hiroyuki
    Kuroki, Shinichiro
    Tohro, Minami
    Bando, Hajime
    Sakurai, Naoki
    COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY B-BIOCHEMISTRY & MOLECULAR BIOLOGY, 2006, 145 (3-4): : 408 - 408
  • [46] Convergence you can see, hear, and use
    McLaughlin, Patrick
    CONNECTOR SPECIFIER, 2008, 24 (06) : 7 - 7
  • [47] STEREO VECTORSCOPE - SEE WHAT YOU HEAR
    SMITH, SS
    DB-SOUND ENGINEERING MAGAZINE, 1977, 11 (10): : 48 - 52
  • [48] Sensory ecology: See me, hear me
    Ryan, Michael J.
    CURRENT BIOLOGY, 2007, 17 (23) : R1019 - R1021
  • [49] Cigarette smoking and adolescents: Messages they see and hear
    Crawford, MA
    PUBLIC HEALTH REPORTS, 2001, 116 : 203 - 215
  • [50] HEAR NO DATA, SEE NO DATA, SPEAK NO DATA
    CHRISTIANSEN, D
    IEEE SPECTRUM, 1982, 19 (05) : 37 - 37