Audio-Visual Fusion for Sound Source Localization and Improved Attention

被引:0
|
作者
Lee, Byoung-gi [1 ]
Choi, JongSuk [1 ]
Yoon, SangSuk [2 ]
Choi, Mun-Taek [2 ]
Kim, Munsang [2 ]
Kim, Daijin [3 ]
机构
[1] Korea Inst Sci & Technol, Ctr Cognit Robot Res, Seoul, South Korea
[2] Korea Inst Sci & Technol, Ctr Intelligent Robot, Seoul, South Korea
[3] Postech, Dept Comp Sci & Engn, Pohang, South Korea
关键词
Audio-Vision Fusion; Sound Source Localization; Human Attention; Robot Tracking;
D O I
10.3795/KSME-A.2011.35.7.737
中图分类号
TH [机械、仪表工业];
学科分类号
0802 ;
摘要
Service robots are equipped with various sensors such as vision camera, sonar sensor, laser scanner, and microphones. Although these sensors have their own functions, some of them can be made to work together and perform more complicated functions. Audiovisual fusion is a typical and powerful combination of audio and video sensors, because audio information is complementary to visual information and vice versa. Human beings also mainly depend on visual and auditory information in their daily life. In this paper, we conduct two studies using audiovision fusion: one is on enhancing the performance of sound localization, and the other is on improving robot attention through sound localization and face detection.
引用
收藏
页码:737 / 743
页数:7
相关论文
共 50 条
  • [41] Robust Audio-Visual Contrastive Learning for Proposal-Based Self-Supervised Sound Source Localization in Videos
    Xuan, Hanyu
    Wu, Zhiliang
    Yang, Jian
    Jiang, Bo
    Luo, Lei
    Alameda-Pineda, Xavier
    Yan, Yan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (07) : 4896 - 4907
  • [42] Audio-Visual Attention Networks for Emotion Recognition
    Lee, Jiyoung
    Kim, Sunok
    Kim, Seungryong
    Sohn, Kwanghoon
    AVSU'18: PROCEEDINGS OF THE 2018 WORKSHOP ON AUDIO-VISUAL SCENE UNDERSTANDING FOR IMMERSIVE MULTIMEDIA, 2018, : 27 - 32
  • [43] VIDEO CODING BASED ON AUDIO-VISUAL ATTENTION
    Lee, Jong-Seok
    De Simone, Francesca
    Ebrahimi, Touradj
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 57 - 60
  • [44] Localize to Binauralize: Audio Spatialization from Visual Sound Source Localization
    Rachavarapu, Kranthi Kumar
    Aakanksha, Aakanksha
    Sundaresha, Vignesh
    Rajagopalan, A. N.
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1910 - 1919
  • [45] Deep Audio-Visual Beamforming for Speaker Localization
    Qian, Xinyuan
    Zhang, Qiquan
    Guan, Guohui
    Xue, Wei
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1132 - 1136
  • [46] A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition
    Praveen, R. Gnana
    de Melo, Wheidima Carneiro
    Ullah, Nasib
    Aslam, Haseeb
    Zeeshan, Osama
    Denorme, Theo
    Pedersoli, Marco
    Koerich, Alessandro L.
    Bacon, Simon
    Cardinal, Patrick
    Granger, Eric
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 2485 - 2494
  • [47] Cross-Modal Attention Network for Temporal Inconsistent Audio-Visual Event Localization
    Xuan, Hanyu
    Zhang, Zhenyu
    Chen, Shuo
    Yang, Jian
    Yan, Yan
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 279 - 286
  • [48] Span-based Audio-Visual Localization
    Wu, Yiling
    Zhang, Xinfeng
    Wang, Yaowei
    Huang, Qingming
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1252 - 1260
  • [49] Audio-Visual Event Localization in Unconstrained Videos
    Tian, Yapeng
    Shi, Jing
    Li, Bochen
    Duan, Zhiyao
    Xu, Chenliang
    COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 : 252 - 268
  • [50] Scene recognition with audio-visual sensor fusion
    Devicharan, D
    Mehrotra, KG
    Mohan, CK
    Varshney, PK
    Zuo, L
    Multisensor, Multisource Information Fusion: Architectures, Algorithms and Applications 2005, 2005, 5813 : 201 - 210