TransCNNLoc: End-to-end pixel-level learning for 2D-to-3D pose estimation in dynamic indoor scenes

被引:2
|
作者
Tang, Shengjun [1 ,2 ]
Li, Yusong [3 ]
Wan, Jiawei [1 ]
Li, You [4 ]
Zhou, Baoding [5 ]
Guo, Renzhong [1 ,2 ]
Wang, Weixi [1 ,2 ]
Feng, Yuhong [3 ]
机构
[1] Shenzhen Univ, Res Inst Smart Cities, Sch Architecture & Urban Planning, Shenzhen, Peoples R China
[2] State Key Lab Subtrop Bldg & Urban Sci, Guangzhou, Peoples R China
[3] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen, Peoples R China
[4] Guangdong Lab Artificial Intelligence & Digital Ec, Shenzhen, Peoples R China
[5] Shenzhen Univ, Coll Civil & Transportat Engn, Shenzhen, Peoples R China
关键词
Indoor localization; Feature learning; Structure from motion; Levenberg-Marquardt; Image retrieval;
D O I
10.1016/j.isprsjprs.2023.12.006
中图分类号
P9 [自然地理学];
学科分类号
0705 ; 070501 ;
摘要
Accurate localization in GPS-denied environments has always been a core issue in computer vision and robotics research. In indoor environments, vision-based localization methods are susceptible to changes in lighting conditions, viewing angles, and environmental factors, resulting in localization failures or limited generalization capabilities. In this paper, we propose the TransCNNLoc framework, which consists of an encoding-decoding network designed to learn more robust image features for camera pose estimation. In the image feature encoding stage, CNN and Swin Transformer are integrated to construct the image feature encoding module, enabling the network to fully extract global context and local features from images. In the decoding stage, multi-level image features are decoded through cross-layer connections while computing per-pixel feature weight maps. To enhance the framework's robustness to dynamic objects, a dynamic object recognition network is introduced to optimize the feature weights. Finally, a multi-level iterative optimization from coarse to fine levels is performed to recover six degrees of freedom camera pose. Experiments were conducted on the publicly available 7scenes dataset as well as a dataset collected under changing lighting conditions and dynamic scenes for accuracy validation and analysis. The experimental results demonstrate that the proposed TransCNNLoc framework exhibits superior adaptability to dynamic scenes and lighting changes. In the context of static environments within publicly available datasets, the localization technique introduced in this study attains a maximal precision of up to 5 centimeters, consistently achieving superior outcomes across a majority of the scenarios. Under the conditions of dynamic scenes and fluctuating illumination, this approach demonstrates an enhanced precision capability, reaching up to 3 centimeters. This represents a substantial refinement from the decimeter scale to a centimeter scale in precision, marking a significant advancement over the existing state-of-the-art (SOTA) algorithms. The open-source repository for the method proposed in this paper can be found at the following URL: github.com/Geelooo/TransCNNloc.
引用
收藏
页码:218 / 230
页数:13
相关论文
共 50 条
  • [21] PANet: A Pixel-Level Attention Network for 6D Pose Estimation With Embedding Vector Features
    Xie, Tao
    Wang, Ke
    Li, Ruifeng
    Tang, Xinyue
    Zhao, Lijun
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (02) : 1840 - 1847
  • [22] End-to-end Learning for 3D Facial Animation from Speech
    Pham, Hai X.
    Wang, Yuting
    Pavlovic, Vladimir
    ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2018, : 361 - 365
  • [23] End-to-End Learning on 3D Protein Structure for Interface Prediction
    Townshend, Raphael J. L.
    Bedi, Rishi
    Suriana, Patricia A.
    Dror, Ron O.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [24] DGECN: A Depth-Guided Edge Convolutional Network for End-to-End 6D Pose Estimation
    Cao, Tuo
    Luo, Fei
    Fu, Yanping
    Zhang, Wenxiao
    Zheng, Shengjie
    Xiao, Chunxia
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3773 - 3782
  • [25] A lightweight color and geometry feature extraction and fusion module for end-to-end 6D pose estimation
    Zuo, Guoyu
    Liu, Hong
    Li, Jiangeng
    INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2024, 21 (05):
  • [26] REDE: End-to-End Object 6D Pose Robust Estimation Using Differentiable Outliers Elimination
    Hua, Weitong
    Zhou, Zhongxiang
    Wu, Jun
    Huang, Huang
    Wang, Yue
    Xiong, Rong
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02) : 2886 - 2893
  • [27] An End-to-End Deep Learning Model based on Channel Impulse Response Measurement for 2D Indoor Positioning
    Utama, Ida Bagus Krishna Yoga
    Umam, Miftahul Khoir Shilahul
    Jang, Yeong Min
    2024 FIFTEENTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS, ICUFN 2024, 2024, : 153 - 157
  • [28] TPMv2: An end-to-end tomato pose method based on 3D key points detection
    Zhang, Fan
    Gao, Jin
    Song, Chaoyu
    Zhou, Hang
    Zou, Kunlin
    Xie, Jinyi
    Yuan, Ting
    Zhang, Junxiong
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2023, 210
  • [29] Pose-Oriented Transformer with Uncertainty-Guided Refinement for 2D-to-3D Human Pose Estimation
    Li, Han
    Shi, Bowen
    Dai, Wenrui
    Zheng, Hongwei
    Wang, Botao
    Sun, Yu
    Guo, Min
    Li, Chenglin
    Zou, Junni
    Xiong, Hongkai
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 1296 - 1304
  • [30] TesseTrack: End-to-End Learnable Multi-Person Articulated 3D Pose Tracking
    Reddy, N. Dinesh
    Guigues, Laurent
    Pishchulin, Leonid
    Eledath, Jayan
    Narasimhan, Srinivasa G.
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15185 - 15195