Multi-Range View Aggregation Network With Vision Transformer Feature Fusion for 3D Object Retrieval

被引:7
|
作者
Lin, Dongyun [1 ]
Li, Yiqun [1 ]
Cheng, Yi [1 ]
Prasad, Shitala [1 ]
Guo, Aiyuan [1 ]
Cao, Yanpeng [2 ]
机构
[1] ASTAR, Inst Infocomm Res I2R, Singapore 138632, Singapore
[2] Zhejiang Univ, State Key Lab Fluid Power & Mechatron Syst, Hangzhou 310027, Peoples R China
关键词
Three-dimensional displays; Feature extraction; Transformers; Convolutional neural networks; Visualization; Fuses; Deep learning; 3D object retrieval; multi-range view aggregation; multi-head self-attention; feature fusion; SIMILARITY; DIFFUSION;
D O I
10.1109/TMM.2023.3246229
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
View-based methods have achieved state-of-the-art performance in 3D object retrieval. However, view-based methods still encounter two major challenges. The first is how to leverage the inter-view correlation to enhance view-level visual features. The second is how to effectively fuse view-level features into a discriminative global descriptor. Towards these two challenges, we propose a multi-range view aggregation network (MRVA-Net) with a vision transformer based feature fusion scheme for 3D object retrieval. Unlike the existing methods which only consider aggregating neighboring or adjacent views which could bring in redundant information, we propose a multi-range view aggregation module to enhance individual view representations through view aggregation beyond only neighboring views but also incorporate the views at different ranges. Furthermore, to generate the global descriptor from view-level features, we propose to employ the multi-head self-attention mechanism introduced by vision transformer to fuse the view-level features. Extensive experiments conducted on three public datasets including ModelNet40, ShapeNet Core55 and MCB-A demonstrate the superiority of the proposed network over the state-of-the-art methods in 3D object retrieval.
引用
收藏
页码:9108 / 9119
页数:12
相关论文
共 50 条
  • [41] 3D object retrieval based on multi-view convolutional neural networks
    Li, Xi-Xi
    Cao, Qun
    Wei, Sha
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (19) : 20111 - 20124
  • [42] 3D object retrieval based on multi-view convolutional neural networks
    Xi-Xi Li
    Qun Cao
    Sha Wei
    Multimedia Tools and Applications, 2017, 76 : 20111 - 20124
  • [43] Multi-Scale Keypoints Feature Fusion Network for 3D Object Detection from Point Clouds
    Zhang, Xu
    Bai, Linjuan
    Zhang, Zuyu
    Li, Yan
    HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES, 2022, 12
  • [44] Deformable Feature Aggregation for Dynamic Multi-modal 3D Object Detection
    Chen, Zehui
    Li, Zhenyu
    Zhang, Shiquan
    Fang, Liangji
    Jiang, Qinhong
    Zhao, Feng
    COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 628 - 644
  • [45] TransFusion: Multi-Modal Robust Fusion for 3D Object Detection in Foggy Weather Based on Spatial Vision Transformer
    Zhang, Cheng
    Wang, Hai
    Cai, Yingfeng
    Chen, Long
    Li, Yicheng
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (09) : 10652 - 10666
  • [46] Long-Range Grouping Transformer for Multi-View 3D Reconstruction
    Yang, Liying
    Zhu, Zhenwei
    Lin, Xuxin
    Nong, Jian
    Liang, Yanyan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18211 - 18221
  • [47] Hypergraph based feature fusion for 3-D object retrieval
    Wang, Fanglin
    Peng, Jialiang
    Li, Yongjie
    NEUROCOMPUTING, 2015, 151 : 612 - 619
  • [48] EFFECTIVE FISHER VECTOR AGGREGATION FOR 3D OBJECT RETRIEVAL
    Boin, Jean-Baptiste
    Araujo, Andre
    Ballan, Lamberto
    Girod, Bernd
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 1747 - 1751
  • [49] RVT: Robotic View Transformer for 3D Object Manipulation
    Goyal, Ankit
    Xu, Jie
    Guo, Yijie
    Blukis, Valts
    Chao, Yu-Wei
    Fox, Dieter
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [50] VGNet: Multimodal Feature Extraction and Fusion Network for 3D CAD Model Retrieval
    Qin, Feiwei
    Zhan, Gaoyang
    Fang, Meie
    Chen, C. L. Philip
    Li, Ping
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 1432 - 1447