Audio-Visual Fusion for Sound Source Localization and Improved Attention

被引：0

作者：

Lee, Byoung-gi ^{[1
]}

Choi, JongSuk ^{[1
]}

Yoon, SangSuk ^{[2
]}

Choi, Mun-Taek ^{[2
]}

Kim, Munsang ^{[2
]}

Kim, Daijin ^{[3
]}

机构：

[1] Korea Inst Sci & Technol, Ctr Cognit Robot Res, Seoul, South Korea

[2] Korea Inst Sci & Technol, Ctr Intelligent Robot, Seoul, South Korea

[3] Postech, Dept Comp Sci & Engn, Pohang, South Korea

来源：

TRANSACTIONS OF THE KOREAN SOCIETY OF MECHANICAL ENGINEERS A | 2011年 / 35卷 / 07期

关键词：

Audio-Vision Fusion; Sound Source Localization; Human Attention; Robot Tracking;

D O I：

10.3795/KSME-A.2011.35.7.737

中图分类号：

TH [机械、仪表工业];

学科分类号：

0802 ;

摘要：

Service robots are equipped with various sensors such as vision camera, sonar sensor, laser scanner, and microphones. Although these sensors have their own functions, some of them can be made to work together and perform more complicated functions. Audiovisual fusion is a typical and powerful combination of audio and video sensors, because audio information is complementary to visual information and vice versa. Human beings also mainly depend on visual and auditory information in their daily life. In this paper, we conduct two studies using audiovision fusion: one is on enhancing the performance of sound localization, and the other is on improving robot attention through sound localization and face detection.

引用

页码：737 / 743

页数：7

共 50 条

[41] Robust Audio-Visual Contrastive Learning for Proposal-Based Self-Supervised Sound Source Localization in Videos
Xuan, Hanyu
Wu, Zhiliang
Yang, Jian
Jiang, Bo
Luo, Lei
Alameda-Pineda, Xavier
Yan, Yan
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (07) : 4896 - 4907
[42] Audio-Visual Attention Networks for Emotion Recognition
Lee, Jiyoung
Kim, Sunok
Kim, Seungryong
Sohn, Kwanghoon
AVSU'18: PROCEEDINGS OF THE 2018 WORKSHOP ON AUDIO-VISUAL SCENE UNDERSTANDING FOR IMMERSIVE MULTIMEDIA, 2018, : 27 - 32
[43] VIDEO CODING BASED ON AUDIO-VISUAL ATTENTION
Lee, Jong-Seok
De Simone, Francesca
Ebrahimi, Touradj
ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 57 - 60
[44] Localize to Binauralize: Audio Spatialization from Visual Sound Source Localization
Rachavarapu, Kranthi Kumar
Aakanksha, Aakanksha
Sundaresha, Vignesh
Rajagopalan, A. N.
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1910 - 1919
[45] Deep Audio-Visual Beamforming for Speaker Localization
Qian, Xinyuan
Zhang, Qiquan
Guan, Guohui
Xue, Wei
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1132 - 1136
[46] A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition
Praveen, R. Gnana
de Melo, Wheidima Carneiro
Ullah, Nasib
Aslam, Haseeb
Zeeshan, Osama
Denorme, Theo
Pedersoli, Marco
Koerich, Alessandro L.
Bacon, Simon
Cardinal, Patrick
Granger, Eric
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 2485 - 2494
[47] Cross-Modal Attention Network for Temporal Inconsistent Audio-Visual Event Localization
Xuan, Hanyu
Zhang, Zhenyu
Chen, Shuo
Yang, Jian
Yan, Yan
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 279 - 286
[48] Span-based Audio-Visual Localization
Wu, Yiling
Zhang, Xinfeng
Wang, Yaowei
Huang, Qingming
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1252 - 1260
[49] Audio-Visual Event Localization in Unconstrained Videos
Tian, Yapeng
Shi, Jing
Li, Bochen
Duan, Zhiyao
Xu, Chenliang
COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 : 252 - 268
[50] Scene recognition with audio-visual sensor fusion
Devicharan, D
Mehrotra, KG
Mohan, CK
Varshney, PK
Zuo, L
Multisensor, Multisource Information Fusion: Architectures, Algorithms and Applications 2005, 2005, 5813 : 201 - 210

← 1 2 3 4 5 →