TransCNNLoc: End-to-end pixel-level learning for 2D-to-3D pose estimation in dynamic indoor scenes

被引：2

作者：

Tang, Shengjun ^{[1
,2
]}

Li, Yusong ^{[3
]}

Wan, Jiawei ^{[1
]}

Li, You ^{[4
]}

Zhou, Baoding ^{[5
]}

Guo, Renzhong ^{[1
,2
]}

Wang, Weixi ^{[1
,2
]}

Feng, Yuhong ^{[3
]}

机构：

[1] Shenzhen Univ, Res Inst Smart Cities, Sch Architecture & Urban Planning, Shenzhen, Peoples R China

[2] State Key Lab Subtrop Bldg & Urban Sci, Guangzhou, Peoples R China

[3] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen, Peoples R China

[4] Guangdong Lab Artificial Intelligence & Digital Ec, Shenzhen, Peoples R China

[5] Shenzhen Univ, Coll Civil & Transportat Engn, Shenzhen, Peoples R China

来源：

ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING | 2024年 / 207卷

关键词：

Indoor localization; Feature learning; Structure from motion; Levenberg-Marquardt; Image retrieval;

D O I：

10.1016/j.isprsjprs.2023.12.006

中图分类号：

P9 [自然地理学];

学科分类号：

0705 ; 070501 ;

摘要：

Accurate localization in GPS-denied environments has always been a core issue in computer vision and robotics research. In indoor environments, vision-based localization methods are susceptible to changes in lighting conditions, viewing angles, and environmental factors, resulting in localization failures or limited generalization capabilities. In this paper, we propose the TransCNNLoc framework, which consists of an encoding-decoding network designed to learn more robust image features for camera pose estimation. In the image feature encoding stage, CNN and Swin Transformer are integrated to construct the image feature encoding module, enabling the network to fully extract global context and local features from images. In the decoding stage, multi-level image features are decoded through cross-layer connections while computing per-pixel feature weight maps. To enhance the framework's robustness to dynamic objects, a dynamic object recognition network is introduced to optimize the feature weights. Finally, a multi-level iterative optimization from coarse to fine levels is performed to recover six degrees of freedom camera pose. Experiments were conducted on the publicly available 7scenes dataset as well as a dataset collected under changing lighting conditions and dynamic scenes for accuracy validation and analysis. The experimental results demonstrate that the proposed TransCNNLoc framework exhibits superior adaptability to dynamic scenes and lighting changes. In the context of static environments within publicly available datasets, the localization technique introduced in this study attains a maximal precision of up to 5 centimeters, consistently achieving superior outcomes across a majority of the scenarios. Under the conditions of dynamic scenes and fluctuating illumination, this approach demonstrates an enhanced precision capability, reaching up to 3 centimeters. This represents a substantial refinement from the decimeter scale to a centimeter scale in precision, marking a significant advancement over the existing state-of-the-art (SOTA) algorithms. The open-source repository for the method proposed in this paper can be found at the following URL: github.com/Geelooo/TransCNNloc.

引用

页码：218 / 230

页数：13

共 50 条

[41] 6D-VNet: End-to-end 6DoF Vehicle Pose Estimation from Monocular RGB Images
Wu, Di
Zhuang, Zhaoyong
Xiang, Canqun
Zou, Wenbin
Li, Xia
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 1238 - 1247
[42] Pose2Sim: An End-to-End Workflow for 3D Markerless Sports Kinematics-Part 1: Robustness
Pagnon, David
Domalain, Mathieu
Reveret, Lionel
SENSORS, 2021, 21 (19)
[43] VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
Zhou, Yin
Tuzel, Oncel
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4490 - 4499
[44] End-to-end learning of 3D phase-only holograms for holographic display
Liang Shi
Beichen Li
Wojciech Matusik
Light: Science & Applications, 11
[45] End-to-end learning of 3D phase-only holograms for holographic display
Shi, Liang
Li, Beichen
Matusik, Wojciech
LIGHT-SCIENCE & APPLICATIONS, 2022, 11 (01)
[46] 2D-3D Cross-Modality Network for End-to-End Localization with Probabilistic Supervision
Pan, Jin
Mu, Xiangru
Qin, Tong
Xu, Chunjing
Yang, Ming
2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 906 - 912
[47] End-to-End Learning of Speech 2D Feature-Trajectory for Prosthetic Hands
Jafarzadeh, Mohsen
Tadesse, Yonas
2020 SECOND INTERNATIONAL CONFERENCE ON TRANSDISCIPLINARY AI (TRANSAI 2020), 2020, : 25 - 33
[48] PWOC-3D: Deep Occlusion-Aware End-to-End Scene Flow Estimation
Saxena, Rohan
Schuster, Rene
Wasenmueller, Oliver
Stricker, Didier
2019 30TH IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV19), 2019, : 324 - 331
[49] 2D and 3D CMOS MAPS with high performance pixel-level signal processing
Traversi, Gianluca
Gaioni, Luigi
Manghisoni, Massimo
Ratti, Lodovico
Re, Valerio
NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTION A-ACCELERATORS SPECTROMETERS DETECTORS AND ASSOCIATED EQUIPMENT, 2011, 628 (01): : 212 - 215
[50] A Geometric Knowledge Oriented Single-Frame 2D-to-3D Human Absolute Pose Estimation Method
Hu, Mengxian
Liu, Chengju
Li, Shu
Yan, Qingqing
Fang, Qin
Chen, Qijun
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) : 7282 - 7295

← 1 2 3 4 5 →