Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual Navigation

被引:0
|
作者
Chen, Jinyu [1 ]
Wang, Wenguan [2 ]
Liu, Si [1 ]
Li, Hongsheng [3 ]
Yang, Yi [2 ]
机构
[1] Beihang Univ, Inst Artificial Intelligence, Beijing, Peoples R China
[2] Zhejiang Univ, ReLER, CCAI, Hangzhou, Peoples R China
[3] Chinese Univ Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
D O I
10.1109/ICCV51070.2023.01009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Audio-visual navigation is an audio-targeted wayfinding task where a robot agent is entailed to travel a neverbefore-seen 3D environment towards the sounding source. In this article, we present ORAN, an omnidirectional audio-visual navigator based on cross-task navigation skill transfer. In particular, ORAN sharpens its two basic abilities for a such challenging task, namely wayfinding and audio-visual information gathering. First, ORAN is trained with a confidence-aware cross-task policy distillation (CCPD) strategy. CCPD transfers the fundamental, point-to-point wayfinding skill that is well trained on the large-scale Point-Goal task to ORAN, so as to help ORAN to better master audio-visual navigation with far fewer training samples. To improve the efficiency of knowledge transfer and address the domain gap, CCPD is made to be adaptive to the decision confidence of the teacher policy. Second, ORAN is equipped with an omnidirectional information gathering (OIG) mechanism, i.e., gleaning visual-acoustic observations from different directions before decision-making. As a result, ORAN yields more robust navigation behaviour. Taking CCPD and OIG together, ORAN significantly outperforms previous competitors. After the model ensemble, we got 1st in Soundspaces Challenge 2022, improving SPL and SR by 53% and 35% relatively.
引用
收藏
页码:10959 / 10969
页数:11
相关论文
共 50 条
  • [1] Semantic Audio-Visual Navigation
    Chen, Changan
    Al-Halah, Ziad
    Grauman, Kristen
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15511 - 15520
  • [2] Unified Audio-Visual Saliency Model for Omnidirectional Videos With Spatial Audio
    Zhu, Dandan
    Zhang, Kaiwei
    Zhang, Nana
    Zhou, Qiangqiang
    Min, Xiongkuo
    Zhai, Guangtao
    Yang, Xiaokang
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 764 - 775
  • [3] Towards Audio-Visual Saliency Prediction for Omnidirectional Video with Spatial Audio
    Chao, Fang-Yi
    Ozcinar, Cagri
    Zhang, Lu
    Hamidouche, Wassim
    Deforges, Olivier
    Smolic, Aljosa
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2020, : 355 - 358
  • [4] Perceptual Quality Assessment of Omnidirectional Audio-Visual Signals
    Zhu, Xilei
    Duan, Huiyu
    Cao, Yuqin
    Zhu, Yuxin
    Zhu, Yucheng
    Liu, Jing
    Chen, Li
    Min, Xiongkuo
    Zhai, Guangtao
    [J]. ARTIFICIAL INTELLIGENCE, CICAI 2023, PT II, 2024, 14474 : 512 - 525
  • [5] A design of robust omnidirectional audio-visual talker localizer
    Denda, Yuki
    Nishiura, Takanobu
    Yamashita, Yoichi
    [J]. PROCEEDINGS OF THE 10TH IASTED INTERNATIONAL CONFERENCE ON INTERNET AND MULTIMEDIA SYSTEMS AND APPLICATIONS, 2006, : 210 - +
  • [6] Transfer of Audio-Visual Temporal Training to Temporal and Spatial Audio-Visual Tasks
    Suerig, Ralf
    Bottari, Davide
    Roeder, Brigitte
    [J]. MULTISENSORY RESEARCH, 2018, 31 (06) : 556 - 578
  • [7] Knowledge engineering, semantics, and signal processing in audio-visual information retrieval
    Izquierdo, Ebroul
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2007, 17 (03) : 257 - 260
  • [8] Extraction of Information of Audio-Visual Contents
    Aguilar, Carlos
    Sanchez, Lydia
    Campos, Manuel
    [J]. TRIPLEC-COMMUNICATION CAPITALISM & CRITIQUE, 2011, 9 (02): : 543 - 550
  • [9] AUDIO-VISUAL PERCEPTION OF OMNIDIRECTIONAL VIDEO FOR VIRTUAL REALITY APPLICATIONS
    Chao, Fang-Yi
    Ozcinar, Cagri
    Wang, Chen
    Zerman, Emin
    Zhang, Lu
    Hamidouche, Wassim
    Deforges, Olivier
    Smolic, Aljosa
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2020,
  • [10] Research on Lateral Transfer Audio-Visual Teaching based on Corpus
    Bin, Qin
    [J]. PROCEEDINGS OF THE 2016 4TH INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE, EDUCATION TECHNOLOGY, ARTS, SOCIAL SCIENCE AND ECONOMICS (MSETASSE-16), 2016, 85 : 1237 - 1241