Multi-Range View Aggregation Network With Vision Transformer Feature Fusion for 3D Object Retrieval

被引:7
|
作者
Lin, Dongyun [1 ]
Li, Yiqun [1 ]
Cheng, Yi [1 ]
Prasad, Shitala [1 ]
Guo, Aiyuan [1 ]
Cao, Yanpeng [2 ]
机构
[1] ASTAR, Inst Infocomm Res I2R, Singapore 138632, Singapore
[2] Zhejiang Univ, State Key Lab Fluid Power & Mechatron Syst, Hangzhou 310027, Peoples R China
关键词
Three-dimensional displays; Feature extraction; Transformers; Convolutional neural networks; Visualization; Fuses; Deep learning; 3D object retrieval; multi-range view aggregation; multi-head self-attention; feature fusion; SIMILARITY; DIFFUSION;
D O I
10.1109/TMM.2023.3246229
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
View-based methods have achieved state-of-the-art performance in 3D object retrieval. However, view-based methods still encounter two major challenges. The first is how to leverage the inter-view correlation to enhance view-level visual features. The second is how to effectively fuse view-level features into a discriminative global descriptor. Towards these two challenges, we propose a multi-range view aggregation network (MRVA-Net) with a vision transformer based feature fusion scheme for 3D object retrieval. Unlike the existing methods which only consider aggregating neighboring or adjacent views which could bring in redundant information, we propose a multi-range view aggregation module to enhance individual view representations through view aggregation beyond only neighboring views but also incorporate the views at different ranges. Furthermore, to generate the global descriptor from view-level features, we propose to employ the multi-head self-attention mechanism introduced by vision transformer to fuse the view-level features. Extensive experiments conducted on three public datasets including ModelNet40, ShapeNet Core55 and MCB-A demonstrate the superiority of the proposed network over the state-of-the-art methods in 3D object retrieval.
引用
收藏
页码:9108 / 9119
页数:12
相关论文
共 50 条
  • [21] Multi-feature Fusion VoteNet for 3D Object Detection
    Wang, Zhoutao
    Xie, Qian
    Wei, Mingqiang
    Long, Kun
    Wang, Jun
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (01)
  • [22] Comparative Study of Multi-View 3D Object Retrieval with Autoencoder & Deep Embedding Network
    Aktar, Sakifa
    Al Mamun, Md
    Hossain, Md Ali
    2018 21ST INTERNATIONAL CONFERENCE OF COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2018,
  • [23] MVTr: multi-feature voxel transformer for 3D object detection
    Ai, Lingmei
    Xie, Zhuoyu
    Yao, Ruoxia
    Yang, Mengyao
    VISUAL COMPUTER, 2024, 40 (03): : 1453 - 1466
  • [24] MVTr: multi-feature voxel transformer for 3D object detection
    Lingmei Ai
    Zhuoyu Xie
    Ruoxia Yao
    Mengyao Yang
    The Visual Computer, 2024, 40 : 1453 - 1466
  • [25] Emphasizing 3D Properties in Recurrent Multi-View Aggregation for 3D Shape Retrieval
    Xu, Cheng
    Leng, Biao
    Zhang, Cheng
    Zhou, Xiaochen
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 7428 - 7435
  • [26] MSIT-Det: Multi-Scale Feature Aggregation with Iterative Transformer Networks for 3D Object Detection
    Li, Xi
    Chen, Yuanyuan
    Lv, Yisheng
    2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 5510 - 5515
  • [27] Multi-view and multivariate gaussian descriptor for 3D object retrieval
    Zan Gao
    Kai-Xin Xue
    Hua Zhang
    Multimedia Tools and Applications, 2019, 78 : 555 - 572
  • [28] Multi-view and multivariate gaussian descriptor for 3D object retrieval
    Gao, Zan
    Xue, Kai-Xin
    Zhang, Hua
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (01) : 555 - 572
  • [29] A 3D model retrieval method based on multi-feature fusion
    Tu H.
    International Journal of Information and Communication Technology, 2019, 15 (02) : 121 - 131
  • [30] Unsupervised 3D Object Retrieval in Loop View
    Kuang Z.
    Yang J.
    Yu J.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2021, 33 (05): : 765 - 771