HMTN: Hierarchical Multi-scale Transformer Network for 3D Shape Recognition

被引:3
|
作者
Zhao, Yue [1 ,2 ]
Nie, Weizhi [1 ]
Gao, Zan [3 ]
Liu, An-an [1 ,2 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China
[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei, Peoples R China
[3] Shandong Artificial Intelligence Inst, Jinan, Peoples R China
基金
中国国家自然科学基金;
关键词
3D Shape Recognition; Transformer; Hierarchical Network;
D O I
10.1145/3503161.3548140
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
As an important field of multimedia, 3D shape recognition has attracted much research attention in recent years. Various approaches have been proposed, within which the multiview-based methods show their promising performances. In general, an effective 3D shape recognition algorithm should take both the multiview local and global visual information into consideration, and explore the inherent properties of generated 3D descriptors to guarantee the performance of feature alignment in the common space. To tackle these issues, we propose a novel Hierarchical Multi-scale Transformer Network (HMTN) for the 3D shape recognition task. In HMTN, we propose a multi-level regional transformer (MLRT) module for shape descriptor generation. MLRT includes two branches that aim to extract the intra-view local characteristics by modeling region-wise dependencies and give the supervision of multiview global information under different granularities. Specifically, MLRT can comprehensively consider the relations of different regions and focus on the discriminative parts, which improves the effectiveness of the learned descriptors. Finally, we adopt the cross-granularity contrastive learning (CCL) mechanism for shape descriptor alignment in the common space. It can explore and utilize the cross-granularity semantic correlation to guide the descriptor extraction process while performing the instance alignment based on the category information. We evaluate the proposed network on several public benchmarks, and HMTN achieves competitive performance compared with the state-of-the-art (SOTA) methods.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Multi-scale Interest Dynamic Hierarchical Transformer for sequential recommendation
    Nana Huang
    Ruimin Hu
    Mingfu Xiong
    Xiaoran Peng
    Hongwei Ding
    Xiaodong Jia
    Lingkun Zhang
    Neural Computing and Applications, 2022, 34 : 16643 - 16654
  • [42] Multi-scale Interest Dynamic Hierarchical Transformer for sequential recommendation
    Huang, Nana
    Hu, Ruimin
    Xiong, Mingfu
    Peng, Xiaoran
    Ding, Hongwei
    Jia, Xiaodong
    Zhang, Lingkun
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (19): : 16643 - 16654
  • [43] MDHA: Multi-Scale Deformable Transformer with Hybrid Anchors for Multi-View 3D Object Detection
    Adeline, Michelle
    Loo, Junn Yong
    Baskaran, Vishnu Monn
    2024 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS 2024, 2024, : 2668 - 2675
  • [44] Hierarchical Multi-Scale Gaussian Transformer for Stock Movement Prediction
    Ding, Qianggang
    Wu, Sifan
    Sun, Hao
    Guo, Jiadong
    Guo, Jian
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 4640 - 4646
  • [45] DSNet: Dual-stream multi-scale fusion network for low-quality 3D face recognition
    Zhao, Panzi
    Ming, Yue
    Hu, Nannan
    Lyu, Boyang
    Zhou, Jiangwan
    AIP ADVANCES, 2023, 13 (08)
  • [46] 3D human pose estimation with multi-scale graph convolution and hierarchical body pooling
    Huang, Ke
    Sui, TianQi
    Wu, Hong
    MULTIMEDIA SYSTEMS, 2022, 28 (02) : 403 - 412
  • [47] 3D human pose estimation with multi-scale graph convolution and hierarchical body pooling
    Ke Huang
    TianQi Sui
    Hong Wu
    Multimedia Systems, 2022, 28 : 403 - 412
  • [48] MSAPVT: a multi-scale attention pyramid vision transformer network for large-scale fruit recognition
    Rao, Yao
    Li, Chaofeng
    Xu, Feiran
    Guo, Ya
    JOURNAL OF FOOD MEASUREMENT AND CHARACTERIZATION, 2024, 18 (11) : 9233 - 9251
  • [49] Hierarchical Multi-scale Attention Networks for action recognition
    Yan, Shiyang
    Smith, Jeremy S.
    Lu, Wenjin
    Zhang, Bailing
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2018, 61 : 73 - 84
  • [50] Multi-Scale 3D Printed Capillary Gripper
    Cavaiani, Marco
    Dehaeck, Sam
    Vitry, Youen
    Lambertt, Pierre
    2018 INTERNATIONAL CONFERENCE ON MANIPULATION, AUTOMATION AND ROBOTICS AT SMALL SCALES (MARSS), 2018,