Goal Object Grounding and Multimodal Mapping for Multi-object Visual Navigation

被引:0
|
作者
Choi, Jeonghyun [1 ]
Kim, Incheol [1 ]
机构
[1] Department of Computer Science, Kyonggi University, Korea, Republic of
关键词
Deep neural networks - Mapping - Navigation - Simulation platform - Visual languages - Zero-shot learning;
D O I
10.5302/J.ICROS.2024.23.0217
中图分类号
学科分类号
摘要
Multi-object visual navigation (MultiON) is a special type of visual navigation task that requires an embodied agent to visit multiple goal objects distributed over an unseen three-dimensional (3D) environment in a predefined order. To successfully execute MultiON, an agent should be able to accurately ground individual goal objects based on language descriptions regarding their color and shape attributes and build a semantically rich map that effectively covers the entire environment. In this paper, we propose a novel deep neural network-based agent model for performing MultiON tasks. The proposed model provides unique solutions to three different issues regarding MultiON agent design. First, the model adopts the pre-trained Grounding DINO module to ground the language descriptions of goal objects to the visual objects in input images in a zero-shot manner. Moreover, the model uses Bayesian posterior probabilities to effectively register the uncertain local contexts extracted from input images onto the global map. Finally, the model applies a novel reward function to efficiently motivate the agent to explore unvisited areas in the given environment for rapid and accurate map expansion. We demonstrate the superiority of the proposed model by conducting various quantitative and qualitative experiments using the 3D simulation platform, AI-Habitat, and the benchmark scene dataset, Matterport3D. © ICROS 2024.
引用
收藏
页码:596 / 606
相关论文
共 50 条
  • [31] Referring Multi-Object Tracking
    Wu, Dongming
    Han, Wencheng
    Wang, Tiancai
    Dong, Xingping
    Zhang, Xiangyu
    Shen, Jianbing
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14633 - 14642
  • [32] The GEMINI multi-object spectrographs
    AllingtonSmith, J
    Bettess, P
    Chadwick, E
    Content, R
    Davies, R
    Dodsworth, G
    Haynes, R
    Lee, D
    Lewis, I
    Webster, J
    Atad, E
    Beard, S
    Bennett, R
    Ellis, M
    Hastings, P
    Williams, P
    Bond, T
    Crampton, D
    Davidge, T
    Fletcher, M
    Leckie, B
    Morbey, C
    Murowinski, R
    Roberts, S
    Saddlemyer, L
    Sebesta, J
    Stilburn, J
    Szeto, K
    [J]. WIDE-FIELD SPECTROSCOPY, 1997, 212 : 73 - 79
  • [33] Multi-Object Spectroscopy with MUSE
    Kelz, Andreas
    Kamann, Sebastian
    Urrutia, Tanya
    Weilbacher, Peter
    Bacon, Roland
    [J]. MULTI-OBJECT SPECTROSCOPY IN THE NEXT DECADE: BIG QUESTIONS, LARGE SURVEYS, AND WIDE FIELDS, 2016, 507 : 323 - 327
  • [34] Multi-object tracking in video
    Agbinya, JI
    Rees, D
    [J]. REAL-TIME IMAGING, 1999, 5 (05) : 295 - 304
  • [35] FALCON:: multi-object AO
    Gendron, E
    Assémat, F
    Hammer, F
    Jagourel, P
    Chemla, F
    Laporte, P
    Puech, M
    Marteaud, M
    Zamkotsian, F
    Liotard, A
    Conan, JM
    Fusco, T
    Hubin, N
    [J]. COMPTES RENDUS PHYSIQUE, 2005, 6 (10) : 1110 - 1117
  • [36] Optimal multi-object auctions
    Armstrong, M
    [J]. REVIEW OF ECONOMIC STUDIES, 2000, 67 (03): : 455 - 481
  • [37] Multi-object spectroscopy in space
    Burgarella, D
    Buat, V
    Dohlen, K
    Zamkotsian, F
    Mottier, P
    [J]. 34TH LIEGE INTERNATIONAL ASTROPHYSICS COLLOQUIUM - THE NEXT GENERATION SPACE TELESCOPE SCIENCE DRIVERS AND TECHNOLOGICAL CHALLENGES, 1998, 429 : 147 - 152
  • [38] Multi-object Grasping in the Plane
    Agboh, Wisdom C.
    Ichnowski, Jeffrey
    Goldberg, Ken
    Dogar, Mehmet R.
    [J]. ROBOTICS RESEARCH, ISRR 2022, 2023, 27 : 222 - 238
  • [39] Multi-object spectrograph TAUMOK
    Ball, M
    Ziener, R
    [J]. WIDE-FIELD SPECTROSCOPY, 1997, 212 : 117 - 118
  • [40] Multi-object trajectory tracking
    Mei Han
    Wei Xu
    Hai Tao
    Yihong Gong
    [J]. Machine Vision and Applications, 2007, 18 : 221 - 232