Multi-View Token Clustering and Fusion for 3D Object Recognition and Retrieval

被引:1
|
作者
Fan, Linlong [1 ]
Ge, Yanqi [1 ]
Li, Wen [1 ]
Duan, Lixin [1 ]
机构
[1] Univ Elect Sci & Technol China, Shenzhen Inst Adv Study, Chengdu, Sichuan, Peoples R China
关键词
3D object recognition; vision transformer; multi-view learning;
D O I
10.1109/ICME55011.2023.00200
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
3D object recognition has received extensive attention in recent years. Many existing methods tackle the task by rendering 3D objects from multiple views. However, most multi-view recognition methods do not utilize fine-grained information from different views, which is found to be crucial for improving 3D object representation in the multi-view setting. In this paper, we propose a transformer-based method, referred to as MVCFormer, for multi-view feature clustering and fusion. MVCFormer clusters semantically similar tokens at the same stages and selects representative fine-grained features, which helps to eliminate feature redundancy and remove cluttered backgrounds and make the selected features more diverse. On the other hand, our model also integrates selected features from all stages to obtain a discriminative 3D object representation by a crossattention fusion method. Extensive experiments on benchmark datasets (e.g., ModelNet40, ModelNet10, ShapeNetCore55, and RGBD) clearly demonstrate the effectiveness of our proposed MVCFormer over existing baselines.
引用
收藏
页码:1145 / 1150
页数:6
相关论文
共 50 条
  • [1] Multi-View Hierarchical Fusion Network for 3D Object Retrieval and Classification
    Liu, An-An
    Hu, Nian
    Song, Dan
    Guo, Fu-Bin
    Zhou, He-Yu
    Hao, Tong
    [J]. IEEE ACCESS, 2019, 7 : 153021 - 153030
  • [2] A Compact Multi-View Descriptor for 3D Object Retrieval
    Daras, Petros
    Axenopoulos, Apostolos
    [J]. CBMI: 2009 INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING, 2009, : 115 - 119
  • [3] Learning Relationships for Multi-View 3D Object Recognition
    Yang, Ze
    Wang, Liwei
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7504 - 7513
  • [4] Multi-View 3D Object Retrieval With Deep Embedding Network
    Guo, Haiyun
    Wang, Jinqiao
    Gao, Yue
    Li, Jianqiang
    Lu, Hanqing
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (12) : 5526 - 5537
  • [5] Multi-view and multivariate gaussian descriptor for 3D object retrieval
    Gao, Zan
    Xue, Kai-Xin
    Zhang, Hua
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (01) : 555 - 572
  • [6] Multi-view and multivariate gaussian descriptor for 3D object retrieval
    Zan Gao
    Kai-Xin Xue
    Hua Zhang
    [J]. Multimedia Tools and Applications, 2019, 78 : 555 - 572
  • [7] 3D LayoutCRF for multi-view object class recognition and segmentation
    Hoiem, Derek
    Rother, Carsten
    Winn, John
    [J]. 2007 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-8, 2007, : 580 - +
  • [8] Learning Disentangled Representation for Multi-View 3D Object Recognition
    Huang, Jingjia
    Yan, Wei
    Li, Ge
    Li, Thomas
    Liu, Shan
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (02) : 646 - 659
  • [9] Multi-view convolutional vision transformer for 3D object recognition
    Li, Jie
    Liu, Zhao
    Li, Li
    Lin, Junqin
    Yao, Jian
    Tu, Jingmin
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 95
  • [10] Multi-view ensemble manifold regularization for 3D object recognition
    Hong, Chaoqun
    Yu, Jun
    You, Jane
    Chen, Xuhui
    Tao, Dapeng
    [J]. INFORMATION SCIENCES, 2015, 320 : 395 - 405