TransCNNLoc: End-to-end pixel-level learning for 2D-to-3D pose estimation in dynamic indoor scenes

被引:2
|
作者
Tang, Shengjun [1 ,2 ]
Li, Yusong [3 ]
Wan, Jiawei [1 ]
Li, You [4 ]
Zhou, Baoding [5 ]
Guo, Renzhong [1 ,2 ]
Wang, Weixi [1 ,2 ]
Feng, Yuhong [3 ]
机构
[1] Shenzhen Univ, Res Inst Smart Cities, Sch Architecture & Urban Planning, Shenzhen, Peoples R China
[2] State Key Lab Subtrop Bldg & Urban Sci, Guangzhou, Peoples R China
[3] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen, Peoples R China
[4] Guangdong Lab Artificial Intelligence & Digital Ec, Shenzhen, Peoples R China
[5] Shenzhen Univ, Coll Civil & Transportat Engn, Shenzhen, Peoples R China
关键词
Indoor localization; Feature learning; Structure from motion; Levenberg-Marquardt; Image retrieval;
D O I
10.1016/j.isprsjprs.2023.12.006
中图分类号
P9 [自然地理学];
学科分类号
0705 ; 070501 ;
摘要
Accurate localization in GPS-denied environments has always been a core issue in computer vision and robotics research. In indoor environments, vision-based localization methods are susceptible to changes in lighting conditions, viewing angles, and environmental factors, resulting in localization failures or limited generalization capabilities. In this paper, we propose the TransCNNLoc framework, which consists of an encoding-decoding network designed to learn more robust image features for camera pose estimation. In the image feature encoding stage, CNN and Swin Transformer are integrated to construct the image feature encoding module, enabling the network to fully extract global context and local features from images. In the decoding stage, multi-level image features are decoded through cross-layer connections while computing per-pixel feature weight maps. To enhance the framework's robustness to dynamic objects, a dynamic object recognition network is introduced to optimize the feature weights. Finally, a multi-level iterative optimization from coarse to fine levels is performed to recover six degrees of freedom camera pose. Experiments were conducted on the publicly available 7scenes dataset as well as a dataset collected under changing lighting conditions and dynamic scenes for accuracy validation and analysis. The experimental results demonstrate that the proposed TransCNNLoc framework exhibits superior adaptability to dynamic scenes and lighting changes. In the context of static environments within publicly available datasets, the localization technique introduced in this study attains a maximal precision of up to 5 centimeters, consistently achieving superior outcomes across a majority of the scenarios. Under the conditions of dynamic scenes and fluctuating illumination, this approach demonstrates an enhanced precision capability, reaching up to 3 centimeters. This represents a substantial refinement from the decimeter scale to a centimeter scale in precision, marking a significant advancement over the existing state-of-the-art (SOTA) algorithms. The open-source repository for the method proposed in this paper can be found at the following URL: github.com/Geelooo/TransCNNloc.
引用
收藏
页码:218 / 230
页数:13
相关论文
共 50 条
  • [41] 6D-VNet: End-to-end 6DoF Vehicle Pose Estimation from Monocular RGB Images
    Wu, Di
    Zhuang, Zhaoyong
    Xiang, Canqun
    Zou, Wenbin
    Li, Xia
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 1238 - 1247
  • [42] Pose2Sim: An End-to-End Workflow for 3D Markerless Sports Kinematics-Part 1: Robustness
    Pagnon, David
    Domalain, Mathieu
    Reveret, Lionel
    SENSORS, 2021, 21 (19)
  • [43] VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
    Zhou, Yin
    Tuzel, Oncel
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4490 - 4499
  • [44] End-to-end learning of 3D phase-only holograms for holographic display
    Liang Shi
    Beichen Li
    Wojciech Matusik
    Light: Science & Applications, 11
  • [45] End-to-end learning of 3D phase-only holograms for holographic display
    Shi, Liang
    Li, Beichen
    Matusik, Wojciech
    LIGHT-SCIENCE & APPLICATIONS, 2022, 11 (01)
  • [46] 2D-3D Cross-Modality Network for End-to-End Localization with Probabilistic Supervision
    Pan, Jin
    Mu, Xiangru
    Qin, Tong
    Xu, Chunjing
    Yang, Ming
    2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 906 - 912
  • [47] End-to-End Learning of Speech 2D Feature-Trajectory for Prosthetic Hands
    Jafarzadeh, Mohsen
    Tadesse, Yonas
    2020 SECOND INTERNATIONAL CONFERENCE ON TRANSDISCIPLINARY AI (TRANSAI 2020), 2020, : 25 - 33
  • [48] PWOC-3D: Deep Occlusion-Aware End-to-End Scene Flow Estimation
    Saxena, Rohan
    Schuster, Rene
    Wasenmueller, Oliver
    Stricker, Didier
    2019 30TH IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV19), 2019, : 324 - 331
  • [49] 2D and 3D CMOS MAPS with high performance pixel-level signal processing
    Traversi, Gianluca
    Gaioni, Luigi
    Manghisoni, Massimo
    Ratti, Lodovico
    Re, Valerio
    NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTION A-ACCELERATORS SPECTROMETERS DETECTORS AND ASSOCIATED EQUIPMENT, 2011, 628 (01): : 212 - 215
  • [50] A Geometric Knowledge Oriented Single-Frame 2D-to-3D Human Absolute Pose Estimation Method
    Hu, Mengxian
    Liu, Chengju
    Li, Shu
    Yan, Qingqing
    Fang, Qin
    Chen, Qijun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) : 7282 - 7295