Goal Object Grounding and Multimodal Mapping for Multi-object Visual Navigation

被引:0
|
作者
Choi, Jeonghyun [1 ]
Kim, Incheol [1 ]
机构
[1] Department of Computer Science, Kyonggi University, Korea, Republic of
关键词
Deep neural networks - Mapping - Navigation - Simulation platform - Visual languages - Zero-shot learning;
D O I
10.5302/J.ICROS.2024.23.0217
中图分类号
学科分类号
摘要
Multi-object visual navigation (MultiON) is a special type of visual navigation task that requires an embodied agent to visit multiple goal objects distributed over an unseen three-dimensional (3D) environment in a predefined order. To successfully execute MultiON, an agent should be able to accurately ground individual goal objects based on language descriptions regarding their color and shape attributes and build a semantically rich map that effectively covers the entire environment. In this paper, we propose a novel deep neural network-based agent model for performing MultiON tasks. The proposed model provides unique solutions to three different issues regarding MultiON agent design. First, the model adopts the pre-trained Grounding DINO module to ground the language descriptions of goal objects to the visual objects in input images in a zero-shot manner. Moreover, the model uses Bayesian posterior probabilities to effectively register the uncertain local contexts extracted from input images onto the global map. Finally, the model applies a novel reward function to efficiently motivate the agent to explore unvisited areas in the given environment for rapid and accurate map expansion. We demonstrate the superiority of the proposed model by conducting various quantitative and qualitative experiments using the 3D simulation platform, AI-Habitat, and the benchmark scene dataset, Matterport3D. © ICROS 2024.
引用
收藏
页码:596 / 606
相关论文
共 50 条
  • [1] Feature Compression for Multimodal Multi-Object Tracking
    Li, Xinlin
    Hanna, Osama A.
    Fragouli, Christina
    Diggavi, Suhas
    Verma, Gunjan
    Bhattacharyya, Joydeep
    [J]. MILCOM 2023 - 2023 IEEE MILITARY COMMUNICATIONS CONFERENCE, 2023,
  • [2] Learned Filters for Object Detection in Multi-object Visual Tracking
    Stamatescu, Victor
    Wong, Sebastien
    McDonnell, Mark D.
    Kearney, David
    [J]. AUTOMATIC TARGET RECOGNITION XXVI, 2016, 9844
  • [3] Learning Active Camera for Multi-Object Navigation
    Chen, Peihao
    Ji, Dongyu
    Lin, Kunyang
    Hu, Weiwen
    Huang, Wenbing
    Li, Thomas H.
    Tan, Mingkui
    Gan, Chuang
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [4] Sequence-Agnostic Multi-Object Navigation
    Gireesh, Nandiraju
    Agrawal, Ayush
    Datta, Ahana
    Banerjee, Snehasis
    Sridharan, Mohan
    Bhowmick, Brojeshwar
    Krishna, Madhava
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 9573 - 9579
  • [5] Zero-Shot Object Goal Visual Navigation
    Zhao, Qianfan
    Zhang, Lu
    He, Bin
    Qiao, Hong
    Liu, Zhiyong
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 2025 - 2031
  • [6] HOG Based Multi-object Detection for Urban Navigation
    Chayeb, A.
    Ouadah, N.
    Tobal, Z.
    Lakrouf, M.
    Azouaoui, O.
    [J]. 2014 IEEE 17TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2014, : 2962 - 2967
  • [7] Object Memory Transformer for Object Goal Navigation
    Fukushima, Rui
    Ota, Kei
    Kanezaki, Asako
    Sasaki, Yoko
    Yoshiyasu, Yusuke
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 11288 - 11294
  • [8] Online Multi-Object Tracking With Visual and Radar Features
    Bae, Seung-Hwan
    [J]. IEEE ACCESS, 2020, 8 (08): : 90324 - 90339
  • [9] Observational window effects on multi-object reverberation mapping
    Malik, Umang
    Sharp, Rob
    Martini, Paul
    Davis, Tamara M.
    Tucker, Brad E.
    Yu, Zhefu
    Penton, Andrew
    Lewis, Geraint F.
    Calcino, Josh
    [J]. MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2022, 516 (03) : 3238 - 3253
  • [10] Fuzzy logic approach to visual multi-object tracking
    Li Liang-qun
    Zhan Xi-yang
    Liu Zong-xiang
    Xie Wei-xin
    [J]. NEUROCOMPUTING, 2018, 281 : 139 - 151