Deep 6-DoF camera relocalization in variable and dynamic scenes by multitask learning

被引:4
|
作者
Wang, Junyi [1 ,2 ]
Qi, Yue [1 ,2 ,3 ]
机构
[1] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Virtual Real Technol & Syst, Beijing, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Peoples R China
[3] Beihang Univ, Qingdao Res Inst, Qingdao, Peoples R China
基金
中国国家自然科学基金;
关键词
Image-based localization; Deep learning; Dynamic localization; Multitask learning; LOCALIZATION; ROBUST; TRACKING;
D O I
10.1007/s00138-023-01388-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, direct visual localization with convolutional neural networks has attracted researchers' attention with achieving an end-to-end process. However, on the one side, the lack of using 3D information leads to imprecise accuracy. Meanwhile, the single input image confuses the relocalization in the scenes that keep similar views at different positions. On the other side, the relocalization problem in variable or dynamic scenes is still challenging. Concentrating on these concerns, we propose two multitask relocalization networks called MMLNet and MMLNet+ for obtaining the 6-DoF camera pose in static, variable and dynamic scenes. Firstly, addressing the dataset lack of variable scenes, we construct a variable scene dataset with a semiautomatic process combining SFM and MVS algorithms with a few manual labels. Based on the process, three scenes covering an office, a bedroom and a sitting room are gathered and generated. Secondly, to enhance the perception between 2D images and 3D poses, we design a multitask network called MMLNet that regresses both camera pose and scene point cloud. Meanwhile, the Chamfer distance is joined into the original pose loss to optimize MMLNet. Moreover, MMLNet learns the pose trajectory feature by using LSTM layers to the additional pose array input, which meanwhile breaks through the limitation of single image input. Based on the MMLNet, aiming at dynamic and variable scenes, MMLNet+ outputs the auxiliary segmentation branch that distinguishes fixed, changeable or dynamic parts of the input image. Furthermore, we define the feature fusion block to implement the feature sharing among three tasks, further promoting the performance in dynamic and variable environments. Finally, experiments on static, dynamic and our constructed variable datasets demonstrate state-of-the-art relocalization performances of MMLNet and MMLNet+. Simultaneously, the positive effects of the pose learning part, reconstruction branch and segmentation task are also illustrated.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Deep 6-DoF camera relocalization in variable and dynamic scenes by multitask learning
    Junyi Wang
    Yue Qi
    Machine Vision and Applications, 2023, 34
  • [2] PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization
    Kendall, Alex
    Grimes, Matthew
    Cipolla, Roberto
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2938 - 2946
  • [3] Deep learning-based 6-DoF visual relocalization assisted simultaneous localization and mapping (SLAM)
    Wang, Shuo
    Li, Xin
    Zhang, Yu
    Ma, Songhui
    Ren, Xianrui
    INDUSTRIAL ROBOT-THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH AND APPLICATION, 2025,
  • [4] VNLSTM-PoseNet: a novel deep ConvNet for real-time 6-DOF camera relocalization in urban streets
    Li, Ming
    Qin, Jiangying
    Li, Deren
    Chen, Ruizhi
    Liao, Xuan
    Guo, Bingxuan
    GEO-SPATIAL INFORMATION SCIENCE, 2021, 24 (03) : 422 - 437
  • [5] Deep Learning Aided Dynamic Parameter Identification of 6-DOF Robot Manipulators
    Wang, Shoujun
    Shao, Xingmao
    Yang, Liusong
    Liu, Nan
    IEEE ACCESS, 2020, 8 : 138102 - 138116
  • [6] A New Approach to Train Convolutional Neural Networks for Real-Time 6-DOF Camera Relocalization
    Esfahani, Mahdi Abolfazli
    Wu, Keyu
    Yuan, Shenghai
    Wang, Han
    2018 IEEE 14TH INTERNATIONAL CONFERENCE ON CONTROL AND AUTOMATION (ICCA), 2018, : 81 - 85
  • [7] Graph Attention Network for Camera Relocalization on Dynamic Scenes
    Ouali, Mohamed Amine
    Bouguessa, Mohamed
    Ksantini, Riadh
    2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2022, : 224 - 233
  • [8] VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization
    Clark, Ronald
    Wang, Sen
    Markham, Andrew
    Trigoni, Niki
    Wen, Hongkai
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2652 - 2660
  • [9] Deep 6-DOF Tracking
    Garon, Mathieu
    Lalonde, Jean-Francois
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2017, 23 (11) : 2410 - 2418
  • [10] DeepDSAIR: Deep 6-DOF camera relocalization using deblurred semantic-aware image representation for large-scale outdoor environments
    Esfahani, Mandi Abolfazli
    Wu, Keyu
    Yuan, Shenghai
    Wang, Han
    IMAGE AND VISION COMPUTING, 2019, 89 : 120 - 130