A Quantum-Inspired Framework in Leader-Servant Mode for Large-Scale Multi-Modal Place Recognition

被引:0
|
作者
Zhang, Ruonan [1 ]
Li, Ge [2 ]
Gao, Wei [3 ,4 ]
Liu, Shan [5 ]
机构
[1] Ningxia Univ, Sch Adv Interdisciplinary Studies, Zhongwei 755000, Peoples R China
[2] Peking Univ, Sch Elect & Comp Engn SECE, Shenzhen Grad Sch, Guangdong Prov Key Lab Ultra High Definit Immers M, Shenzhen 518055, Peoples R China
[3] Peking Univ, Sch Elect & Comp Engn SECE, Shenzhen Grad Sch, Guangdong Prov Key Lab Ultra High Definit Immers M, Shenzhen 518055, Peoples R China
[4] Peng Cheng Natl Lab, Shenzhen 518066, Peoples R China
[5] Tencent, Media Lab, Palo Alto, CA 94301 USA
基金
中国国家自然科学基金;
关键词
Training; Point cloud compression; Feature extraction; Interference; Wave functions; Quantum mechanics; Image recognition; Fuses; Convolution; Three-dimensional displays; Multi-modal; place recognition; 3D point cloud; image; feature fusion;
D O I
10.1109/TITS.2024.3497574
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Multi-modal place recognition aims to grasp diversified information implied in different modalities to bring vitality to place recognition tasks. The key challenge is rooted in the representation gap in modalities, the feature fusion method, and their relationships. The majority of existing methods are based on uni-modal, leaving these challenges unsolved effectively. To address the problems, encouraged by double-split experiments in physics and cooperation modes, in this paper, we introduce a leader-servant multi-modal framework inspired by quantum theory for large-scale place recognition. Two key modules are designed, a quantum representation module and an interference-aware fusion module. The former is designed for multi-modal data to capture their diversity and bridge the gap, while the latter is proposed to effectively fuse the multi-modal feature with the guidance of the quantum theory. Besides, we propose a leader-servant training strategy for stable training, where three cases are considered with the multi-modal loss as the leader to preserve overall characteristics and other uni-modal losses as the servants to lighten the modality influence of the leader. Furthermore, The framework is compatible with uni-modal place recognition. At last, The experiments on three datasets witness the efficiency, generalization, and robustness of the proposed method in contrast to the other existing methods.
引用
收藏
页码:2027 / 2039
页数:13
相关论文
共 50 条
  • [41] IBISCape: A Simulated Benchmark for multi-modal SLAM Systems Evaluation in Large-scale Dynamic Environments
    Abanob Soliman
    Fabien Bonardi
    Désiré Sidibé
    Samia Bouchafa
    Journal of Intelligent & Robotic Systems, 2022, 106
  • [42] Practical Membership Inference Attacks Against Large-Scale Multi-Modal Models: A Pilot Study
    Ko, Myeongseob
    Jin, Ming
    Wang, Chenguang
    Jia, Ruoxi
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 4848 - 4858
  • [43] WenLan: Efficient Large-Scale Multi-Modal Pre-Training on Real World Data
    Song, Ruihua
    MMPT '21: PROCEEDINGS OF THE 2021 WORKSHOP ON MULTI-MODAL PRE-TRAINING FOR MULTIMEDIA UNDERSTANDING, 2021, : 3 - 3
  • [44] GuideRender: large-scale scene navigation based on multi-modal view frustum movement prediction
    Qin, Yiming
    Chi, Xiaoyu
    Sheng, Bin
    Lau, Rynson W. H.
    VISUAL COMPUTER, 2023, 39 (08): : 3597 - 3607
  • [45] Biological Insight From Large-Scale Studies of Bipolar Disorder With Multi-Modal Imaging and Genomics
    Andreassen, Ole
    Houenou, Josselin
    Duchesnay, Edouard
    Favre, Pauline
    Pauling, Melissa
    van Haren, Neeltje
    Brouwer, Rachel
    de Zwarte, Sonja
    Thompson, Paul
    Ching, Christopher
    BIOLOGICAL PSYCHIATRY, 2018, 83 (09) : S49 - S50
  • [46] GuideRender: large-scale scene navigation based on multi-modal view frustum movement prediction
    Yiming Qin
    Xiaoyu Chi
    Bin Sheng
    Rynson W. H. Lau
    The Visual Computer, 2023, 39 : 3597 - 3607
  • [47] Application of smart card data in validating a large-scale multi-modal transit assignment model
    Tavassoli A.
    Mesbah M.
    Hickman M.
    Tavassoli, Ahmad (a.tavassoli@uq.edu.au), 2018, Springer Verlag (10) : 1 - 21
  • [48] Semantic-Driven Interpretable Deep Multi-Modal Hashing for Large-Scale Multimedia Retrieval
    Lu, Xu
    Liu, Li
    Nie, Liqiang
    Chang, Xiaojun
    Zhang, Huaxiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 4541 - 4554
  • [49] Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation
    Zeng, Zhaoyang
    Luo, Yongsheng
    Liu, Zhenhua
    Rao, Fengyun
    Li, Dian
    Guo, Weidong
    Wen, Zhen
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3128 - 3137
  • [50] CASIA-SURF: A Large-Scale Multi-Modal Benchmark for Face Anti-Spoofing
    Zhang S.
    Liu A.
    Wan J.
    Liang Y.
    Guo G.
    Escalera S.
    Escalante H.J.
    Li S.Z.
    IEEE Transactions on Biometrics, Behavior, and Identity Science, 2020, 2 (02): : 182 - 193