See the Sound, Hear the Pixels

被引：0

作者：

Ramaswamy, Janani ^{[1
]}

Das, Sukhendu ^{[1
]}

机构：

[1] IIT Madras, Dept Comp Sci & Engn, Visualizat & Percept Lab, Madras, Tamil Nadu, India

来源：

2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) | 2020年

关键词：

SEPARATING STYLE;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

For every event occurring in the real world, most often a sound is associated with the corresponding visual scene. Humans possess an inherent ability to automatically map the audio content with visual scenes leading to an effortless and enhanced understanding of the underlying event. This triggers an interesting question: Can this natural correspondence between video and audio, which has been diminutively explored so far, be learned by a machine and modeled jointly to localize the sound source in a visual scene? In this paper, we propose a novel algorithm that addresses the problem of localizing sound source in unconstrained videos, which uses efficient fusion and attention mechanisms. Two novel blocks namely, Audio Visual Fusion Block (AVFB) and Segment-Wise Attention Block (SWAB) have been developed for this purpose. Quantitative and qualitative evaluations show that it is feasible to use the same algorithm with minor modifications to serve the purpose of sound localization using three different types of learning: supervised, weakly supervised and unsupervised. A novel Audio Visual Triplet Gram Matrix Loss (AVTGML) has been proposed as a loss function to learn the localization in an unsupervised way. Our empirical evaluations demonstrate a significant increase in performance over the existing state-of-the-art methods, serving as a testimony to the superiority of our proposed approach.

引用

页码：2959 / 2968

页数：10

共 50 条

[21] For the Health: good to see and hear
不详
DEUTSCHE MEDIZINISCHE WOCHENSCHRIFT, 2019, 144 (19)
[22] See the Music, Hear the Dance
Anderson, Zoe
DANCING TIMES, 2014, 105 (1252): : 45 - 45
[23] Cross-model integration: See what I hear and hear what I see.
Lepore, F
Lessard, N
Leclerc, C
St Armour, D
Guillemot, JP
Lassonde, M
BRAIN AND COGNITION, 1999, 40 (01) : 20 - 20
[24] You hear faster than you see, but you can't hear as accurately as you see
Glazebrook, Cheryl M.
Wong, Lokman
Safir, Alexander
Welsh, Timothy N.
Tremblay, Luc
JOURNAL OF SPORT & EXERCISE PSYCHOLOGY, 2011, 33 : S69 - S70
[25] See, hear, feel- an overturn of sensory perceptions Explosions in the mind: composing psychedelic sound and visualisations
Kavitha, T. S.
SOUND STUDIES, 2024, 10 (02) : 343 - 345
[26] Do You See What I Hear? - Peripheral Absolute and Relational Visualisation Techniques for Sound Zones
Jacobsen, Rune M.
van Berkel, Niels
Skov, Mikael B.
Johansen, Stine S.
Kjeldskov, Jesper
PROCEEDINGS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI' 22), 2022,
[27] See the Sound, Hear the Style: Collaborative Linkages between Indie Musicians and Fashion Designers in Local Scenes
Hauge, Atle
Hracs, Brian J.
INDUSTRY AND INNOVATION, 2010, 17 (01) : 113 - 129
[28] Now hear this: Sound on the Web
不详
TRAINING & DEVELOPMENT, 1997, 51 (07): : 14 - 14
[29] 'HEAR NO EVIL, SEE NO EVIL' - HILLER,A
CIEUTAT, M
POSITIF, 1990, (349): : 79 - 79
[30] Learn to see sounds and hear colours
Robson, David
NEW SCIENTIST, 2009, 201 (2695) : 13 - 13

← 1 2 3 4 5 →