HMTN: Hierarchical Multi-scale Transformer Network for 3D Shape Recognition

被引:3
|
作者
Zhao, Yue [1 ,2 ]
Nie, Weizhi [1 ]
Gao, Zan [3 ]
Liu, An-an [1 ,2 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China
[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei, Peoples R China
[3] Shandong Artificial Intelligence Inst, Jinan, Peoples R China
基金
中国国家自然科学基金;
关键词
3D Shape Recognition; Transformer; Hierarchical Network;
D O I
10.1145/3503161.3548140
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
As an important field of multimedia, 3D shape recognition has attracted much research attention in recent years. Various approaches have been proposed, within which the multiview-based methods show their promising performances. In general, an effective 3D shape recognition algorithm should take both the multiview local and global visual information into consideration, and explore the inherent properties of generated 3D descriptors to guarantee the performance of feature alignment in the common space. To tackle these issues, we propose a novel Hierarchical Multi-scale Transformer Network (HMTN) for the 3D shape recognition task. In HMTN, we propose a multi-level regional transformer (MLRT) module for shape descriptor generation. MLRT includes two branches that aim to extract the intra-view local characteristics by modeling region-wise dependencies and give the supervision of multiview global information under different granularities. Specifically, MLRT can comprehensively consider the relations of different regions and focus on the discriminative parts, which improves the effectiveness of the learned descriptors. Finally, we adopt the cross-granularity contrastive learning (CCL) mechanism for shape descriptor alignment in the common space. It can explore and utilize the cross-granularity semantic correlation to guide the descriptor extraction process while performing the instance alignment based on the category information. We evaluate the proposed network on several public benchmarks, and HMTN achieves competitive performance compared with the state-of-the-art (SOTA) methods.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] MFFTNet: A Novel 3D Point Cloud Segmentation Network Based on Multi-Scale Feature Fusion and Transformer Architecture
    Bai, Hao
    Li, Xiongwei
    Meng, Qing
    Zhuo, Shulong
    Yan, Lili
    IEEE ACCESS, 2025, 13 : 9462 - 9472
  • [22] SVHAN: Sequential View Based Hierarchical Attention Network for 3D Shape Recognition
    Zhao, Yue
    Nie, Weizhi
    Liu, An-An
    Gao, Zan
    Su, Yuting
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2130 - 2138
  • [23] Multi-scale 3D Morse complexes
    Comic, Lidija
    De Floriani, Lelia
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCES AND ITS APPLICATIONS, PROCEEDINGS, 2008, : 441 - +
  • [24] Multi-view Moments Embedding Network for 3D Shape Recognition
    Xiao, Jun
    Zhang, Yuanxing
    Zhao, Pengyu
    Xiao, Kecheng
    Bian, Kaigui
    Zhang, Chunli
    Yan, Wei
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2257 - 2260
  • [25] MVPN: Multi-View Prototype Network for 3D Shape Recognition
    Wu, Zizhao
    Yang, Ping
    Wang, Yigang
    IEEE ACCESS, 2019, 7 : 130363 - 130372
  • [26] MVTN: Multi-View Transformation Network for 3D Shape Recognition
    Hamdi, Abdullah
    Giancola, Silvio
    Ghanem, Bernard
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1 - 11
  • [27] MULTI-SCALE BIDIRECTIONAL ENHANCEMENT NETWORK FOR 3D DENTAL MODEL SEGMENTATION
    Li, Zigang
    Liu, Tingting
    Wang, Jun
    Zhang, Changdong
    Jia, Xiuyi
    2022 IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (IEEE ISBI 2022), 2022,
  • [28] MLNet: An multi-scale line detector and descriptor network for 3D reconstruction
    Yang, Jian
    Rao, Yuan
    Cai, Qing
    Rigall, Eric
    Fan, Hao
    Dong, Junyu
    Yu, Hui
    KNOWLEDGE-BASED SYSTEMS, 2024, 289
  • [29] Multi-scale Feature Injection for Occluded 3D Human Pose and Shape Estimation
    Shi, Yunhui
    Ge, Yangyang
    Wang, Jin
    2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 4881 - 4886
  • [30] Capturing Shape Information with Multi-scale Topological Loss Terms for 3D Reconstruction
    Waibel, Dominik J. E.
    Atwell, Scott
    Meier, Matthias
    Marr, Carsten
    Rieck, Bastian
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT IV, 2022, 13434 : 150 - 159