Goal Object Grounding and Multimodal Mapping for Multi-object Visual Navigation

被引：0

作者：

Choi, Jeonghyun ^{[1
]}

Kim, Incheol ^{[1
]}

机构：

[1] Department of Computer Science, Kyonggi University, Korea, Republic of

来源：

Journal of Institute of Control, Robotics and Systems | 2024年 / 30卷 / 06期

关键词：

Deep neural networks - Mapping - Navigation - Simulation platform - Visual languages - Zero-shot learning;

D O I：

10.5302/J.ICROS.2024.23.0217

中图分类号：

学科分类号：

摘要：

Multi-object visual navigation (MultiON) is a special type of visual navigation task that requires an embodied agent to visit multiple goal objects distributed over an unseen three-dimensional (3D) environment in a predefined order. To successfully execute MultiON, an agent should be able to accurately ground individual goal objects based on language descriptions regarding their color and shape attributes and build a semantically rich map that effectively covers the entire environment. In this paper, we propose a novel deep neural network-based agent model for performing MultiON tasks. The proposed model provides unique solutions to three different issues regarding MultiON agent design. First, the model adopts the pre-trained Grounding DINO module to ground the language descriptions of goal objects to the visual objects in input images in a zero-shot manner. Moreover, the model uses Bayesian posterior probabilities to effectively register the uncertain local contexts extracted from input images onto the global map. Finally, the model applies a novel reward function to efficiently motivate the agent to explore unvisited areas in the given environment for rapid and accurate map expansion. We demonstrate the superiority of the proposed model by conducting various quantitative and qualitative experiments using the 3D simulation platform, AI-Habitat, and the benchmark scene dataset, Matterport3D. © ICROS 2024.

引用

页码：596 / 606

共 50 条

[1] Feature Compression for Multimodal Multi-Object Tracking
Li, Xinlin
Hanna, Osama A.
Fragouli, Christina
Diggavi, Suhas
Verma, Gunjan
Bhattacharyya, Joydeep
[J]. MILCOM 2023 - 2023 IEEE MILITARY COMMUNICATIONS CONFERENCE, 2023,
[2] Learned Filters for Object Detection in Multi-object Visual Tracking
Stamatescu, Victor
Wong, Sebastien
McDonnell, Mark D.
Kearney, David
[J]. AUTOMATIC TARGET RECOGNITION XXVI, 2016, 9844
[3] Learning Active Camera for Multi-Object Navigation
Chen, Peihao
Ji, Dongyu
Lin, Kunyang
Hu, Weiwen
Huang, Wenbing
Li, Thomas H.
Tan, Mingkui
Gan, Chuang
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[4] Sequence-Agnostic Multi-Object Navigation
Gireesh, Nandiraju
Agrawal, Ayush
Datta, Ahana
Banerjee, Snehasis
Sridharan, Mohan
Bhowmick, Brojeshwar
Krishna, Madhava
[J]. 2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 9573 - 9579
[5] Zero-Shot Object Goal Visual Navigation
Zhao, Qianfan
Zhang, Lu
He, Bin
Qiao, Hong
Liu, Zhiyong
[J]. 2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 2025 - 2031
[6] HOG Based Multi-object Detection for Urban Navigation
Chayeb, A.
Ouadah, N.
Tobal, Z.
Lakrouf, M.
Azouaoui, O.
[J]. 2014 IEEE 17TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2014, : 2962 - 2967
[7] Object Memory Transformer for Object Goal Navigation
Fukushima, Rui
Ota, Kei
Kanezaki, Asako
Sasaki, Yoko
Yoshiyasu, Yusuke
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 11288 - 11294
[8] Online Multi-Object Tracking With Visual and Radar Features
Bae, Seung-Hwan
[J]. IEEE ACCESS, 2020, 8 (08): : 90324 - 90339
[9] Observational window effects on multi-object reverberation mapping
Malik, Umang
Sharp, Rob
Martini, Paul
Davis, Tamara M.
Tucker, Brad E.
Yu, Zhefu
Penton, Andrew
Lewis, Geraint F.
Calcino, Josh
[J]. MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2022, 516 (03) : 3238 - 3253
[10] Fuzzy logic approach to visual multi-object tracking
Li Liang-qun
Zhan Xi-yang
Liu Zong-xiang
Xie Wei-xin
[J]. NEUROCOMPUTING, 2018, 281 : 139 - 151

← 1 2 3 4 5 →