Multi-Range View Aggregation Network With Vision Transformer Feature Fusion for 3D Object Retrieval

被引:7
|
作者
Lin, Dongyun [1 ]
Li, Yiqun [1 ]
Cheng, Yi [1 ]
Prasad, Shitala [1 ]
Guo, Aiyuan [1 ]
Cao, Yanpeng [2 ]
机构
[1] ASTAR, Inst Infocomm Res I2R, Singapore 138632, Singapore
[2] Zhejiang Univ, State Key Lab Fluid Power & Mechatron Syst, Hangzhou 310027, Peoples R China
关键词
Three-dimensional displays; Feature extraction; Transformers; Convolutional neural networks; Visualization; Fuses; Deep learning; 3D object retrieval; multi-range view aggregation; multi-head self-attention; feature fusion; SIMILARITY; DIFFUSION;
D O I
10.1109/TMM.2023.3246229
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
View-based methods have achieved state-of-the-art performance in 3D object retrieval. However, view-based methods still encounter two major challenges. The first is how to leverage the inter-view correlation to enhance view-level visual features. The second is how to effectively fuse view-level features into a discriminative global descriptor. Towards these two challenges, we propose a multi-range view aggregation network (MRVA-Net) with a vision transformer based feature fusion scheme for 3D object retrieval. Unlike the existing methods which only consider aggregating neighboring or adjacent views which could bring in redundant information, we propose a multi-range view aggregation module to enhance individual view representations through view aggregation beyond only neighboring views but also incorporate the views at different ranges. Furthermore, to generate the global descriptor from view-level features, we propose to employ the multi-head self-attention mechanism introduced by vision transformer to fuse the view-level features. Extensive experiments conducted on three public datasets including ModelNet40, ShapeNet Core55 and MCB-A demonstrate the superiority of the proposed network over the state-of-the-art methods in 3D object retrieval.
引用
收藏
页码:9108 / 9119
页数:12
相关论文
共 50 条
  • [31] View Context: A 3D Model Feature for Retrieval
    Li, Bo
    Johan, Henry
    ADVANCES IN MULTIMEDIA MODELING, PROCEEDINGS, 2010, 5916 : 185 - 195
  • [32] Cross Modality Fusion Network with Feature Alignment and Salient Object Exchange for Single Image 3D Shape Retrieval
    Diao, Zhenyu
    Niu, Dongmei
    Han, Xiaofan
    Zhao, Xiuyang
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VI, 2025, 15036 : 476 - 490
  • [33] Multi-Person 3D Motion Prediction with Multi-Range Transformers
    Wang, Jiashun
    Xu, Huazhe
    Narasimhan, Medhini
    Wang, Xiaolong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [34] Statistical Score Fusion for 3D Object Retrieval
    Akguel, Ceyhun Burak
    Sankur, Buelent
    Yemez, Yuecel
    2008 IEEE 16TH SIGNAL PROCESSING, COMMUNICATION AND APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2008, : 284 - +
  • [35] OCBEV: Object-Centric BEV Transformer for Multi-View 3D Object Detection
    Qi, Zhangyang
    Wang, Jiaqi
    Wu, Xiaoyang
    Zhao, Hengshuang
    2024 INTERNATIONAL CONFERENCE IN 3D VISION, 3DV 2024, 2024, : 1188 - 1197
  • [36] Empirical Evaluation of Dissimilarity Measures for 3D Object Retrieval with Application to Multi-Feature Retrieval
    Gregor, Robert
    Lamprecht, Andreas
    Sipiran, Ivan
    Schreck, Tobias
    Bustos, Benjamin
    2015 13TH INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI), 2015,
  • [37] 3D object retrieval with multi-feature collaboration and bipartite graph matching
    Zhang, Yan
    Jiang, Feng
    Rho, Seungmin
    Liu, Shaohui
    Zhao, Debin
    Ji, Rongrong
    NEUROCOMPUTING, 2016, 195 : 40 - 49
  • [38] 3D Object Retrieval Based on Multi-View Latent Variable Model
    Liu, An-An
    Nie, Wei-Zhi
    Su, Yu-Ting
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (03) : 868 - 880
  • [39] Hierarchical multi-view context modelling for 3D object classification and retrieval
    Liu, An-An
    Zhou, Heyu
    Nie, Weizhi
    Liu, Zhenguang
    Liu, Wu
    Xie, Hongtao
    Mao, Zhendong
    Li, Xuanya
    Song, Dan
    INFORMATION SCIENCES, 2021, 547 : 984 - 995
  • [40] Triplet-Center Loss for Multi-View 3D Object Retrieval
    He, Xinwei
    Zhou, Yang
    Zhou, Zhichao
    Bai, Song
    Bai, Xiang
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1945 - 1954