HMTN: Hierarchical Multi-scale Transformer Network for 3D Shape Recognition

被引：3

作者：

Zhao, Yue ^{[1
,2
]}

Nie, Weizhi ^{[1
]}

Gao, Zan ^{[3
]}

Liu, An-an ^{[1
,2
]}

机构：

[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China

[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei, Peoples R China

[3] Shandong Artificial Intelligence Inst, Jinan, Peoples R China

来源：

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年

基金：

中国国家自然科学基金;

关键词：

3D Shape Recognition; Transformer; Hierarchical Network;

D O I：

10.1145/3503161.3548140

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

As an important field of multimedia, 3D shape recognition has attracted much research attention in recent years. Various approaches have been proposed, within which the multiview-based methods show their promising performances. In general, an effective 3D shape recognition algorithm should take both the multiview local and global visual information into consideration, and explore the inherent properties of generated 3D descriptors to guarantee the performance of feature alignment in the common space. To tackle these issues, we propose a novel Hierarchical Multi-scale Transformer Network (HMTN) for the 3D shape recognition task. In HMTN, we propose a multi-level regional transformer (MLRT) module for shape descriptor generation. MLRT includes two branches that aim to extract the intra-view local characteristics by modeling region-wise dependencies and give the supervision of multiview global information under different granularities. Specifically, MLRT can comprehensively consider the relations of different regions and focus on the discriminative parts, which improves the effectiveness of the learned descriptors. Finally, we adopt the cross-granularity contrastive learning (CCL) mechanism for shape descriptor alignment in the common space. It can explore and utilize the cross-granularity semantic correlation to guide the descriptor extraction process while performing the instance alignment based on the category information. We evaluate the proposed network on several public benchmarks, and HMTN achieves competitive performance compared with the state-of-the-art (SOTA) methods.

引用

页数：9

共 50 条

[21] MFFTNet: A Novel 3D Point Cloud Segmentation Network Based on Multi-Scale Feature Fusion and Transformer Architecture
Bai, Hao
Li, Xiongwei
Meng, Qing
Zhuo, Shulong
Yan, Lili
IEEE ACCESS, 2025, 13 : 9462 - 9472
[22] SVHAN: Sequential View Based Hierarchical Attention Network for 3D Shape Recognition
Zhao, Yue
Nie, Weizhi
Liu, An-An
Gao, Zan
Su, Yuting
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2130 - 2138
[23] Multi-scale 3D Morse complexes
Comic, Lidija
De Floriani, Lelia
INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCES AND ITS APPLICATIONS, PROCEEDINGS, 2008, : 441 - +
[24] Multi-view Moments Embedding Network for 3D Shape Recognition
Xiao, Jun
Zhang, Yuanxing
Zhao, Pengyu
Xiao, Kecheng
Bian, Kaigui
Zhang, Chunli
Yan, Wei
PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2257 - 2260
[25] MVPN: Multi-View Prototype Network for 3D Shape Recognition
Wu, Zizhao
Yang, Ping
Wang, Yigang
IEEE ACCESS, 2019, 7 : 130363 - 130372
[26] MVTN: Multi-View Transformation Network for 3D Shape Recognition
Hamdi, Abdullah
Giancola, Silvio
Ghanem, Bernard
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1 - 11
[27] MULTI-SCALE BIDIRECTIONAL ENHANCEMENT NETWORK FOR 3D DENTAL MODEL SEGMENTATION
Li, Zigang
Liu, Tingting
Wang, Jun
Zhang, Changdong
Jia, Xiuyi
2022 IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (IEEE ISBI 2022), 2022,
[28] MLNet: An multi-scale line detector and descriptor network for 3D reconstruction
Yang, Jian
Rao, Yuan
Cai, Qing
Rigall, Eric
Fan, Hao
Dong, Junyu
Yu, Hui
KNOWLEDGE-BASED SYSTEMS, 2024, 289
[29] Multi-scale Feature Injection for Occluded 3D Human Pose and Shape Estimation
Shi, Yunhui
Ge, Yangyang
Wang, Jin
2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 4881 - 4886
[30] Capturing Shape Information with Multi-scale Topological Loss Terms for 3D Reconstruction
Waibel, Dominik J. E.
Atwell, Scott
Meier, Matthias
Marr, Carsten
Rieck, Bastian
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT IV, 2022, 13434 : 150 - 159

← 1 2 3 4 5 →