VME-Transformer: Enhancing Visual Memory Encoding for Navigation in Interactive Environments

被引:5
|
作者
Shen, Jiwei [1 ]
Lou, Pengjie [1 ]
Yuan, Liang [2 ]
Lyu, Shujing [1 ]
Lu, Yue [1 ]
机构
[1] East China Normal Univ, Sch Commun & Elect Engn, Shanghai Key Lab Multidimens Informat Proc, Shanghai 200241, Peoples R China
[2] Beijing Univ Chem Technol, Beijing Adv Innovat Ctr Soft Matter Sci & Engn, Chaoyang 100013, Peoples R China
关键词
Visual interactive navigation; reinforcement learning; long-term memory encoding; transformer;
D O I
10.1109/LRA.2023.3333238
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
The efficiency of a robotic system is primarily determined by its ability to navigate complex and interactive environments. In real-world scenarios, cluttered surroundings are common, requiring a robot to navigate diverse spaces and displace objects to pave a path towards its objective. Consequently, "Visual Interactive Navigation" presents several challenges, including how to retain historical exploration information from partially observable visual signals, and how to utilize sparse rewards in reinforcement learning to simultaneously learn a latent representation and a control policy. Addressing these challenges, we introduce a Transformer-based Visual Memory Encoder (VME-Transformer), capable of embedding both recent and long-term exploration information into memory. Additionally, we explicitly estimate the robot's next pose, conditioned on the impending action, to bootstrap the learning process of the high-capacity VME-Transformer. We further regularize the value function by introducing input perturbations, thereby enhancing its generalization capabilities in previously unseen environments. In the Visual Interactive Navigation tasks within the iGibson environment, the VME-Transformer demonstrates superior performance compared to state-of-the-art methods, underlining its effectiveness.
引用
收藏
页码:643 / 650
页数:8
相关论文
共 6 条
  • [1] Transformer Memory for Interactive Visual Navigation in Cluttered Environments
    Li, Weiyuan
    Hong, Ruoxin
    Shen, Jiwei
    Yuan, Liang
    Lu, Yue
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (03) : 1731 - 1738
  • [2] Enhancing visual working memory encoding: The role of target novelty
    Mayer, Jutta S.
    Kim, Jejoong
    Park, Sohee
    VISUAL COGNITION, 2011, 19 (07) : 863 - 885
  • [3] Enhancing Large Language Models with RAG for Visual Language Navigation in Continuous Environments
    Bao, Xiaoan
    Lv, Zhiqiang
    Wu, Biao
    ELECTRONICS, 2025, 14 (05):
  • [4] Navigation of urban vehicle: An efficient visual memory management for large scale environments
    Courbon, Jonathan
    Mezouar, Youcef
    Lequievre, Laurent
    Eck, Laurent
    2008 IEEE/RSJ INTERNATIONAL CONFERENCE ON ROBOTS AND INTELLIGENT SYSTEMS, VOLS 1-3, CONFERENCE PROCEEDINGS, 2008, : 1817 - +
  • [5] Spatial memory-augmented visual navigation based on hierarchical deep reinforcement learning in unknown environments
    Jin, Sheng
    Wang, Xinming
    Meng, Qinghao
    KNOWLEDGE-BASED SYSTEMS, 2024, 285
  • [6] Enhancing Autonomous Visual Perception in Challenging Environments: Bilateral Models with Vision Transformer and Multilayer Perceptron for Traversable Area Detection
    Urrea, Claudio
    Velez, Maximiliano
    TECHNOLOGIES, 2024, 12 (10)