DESIGN AND RESEARCH OF A MULTI-VIEW GRAPH DEEP LEARNING 3D MODEL RETRIEVAL SYSTEM BASED ON FUSION VISION-TRANSFORMER

被引：0

作者：

Liang, Rong ^{[1
]}

Li, Fangping ^{[1
]}

机构：

[1] Taiyuan Univ, Dept Art & Design, 7 Fendong St,Tanghuai Ind Pk, Taiyuan 030000, Peoples R China

来源：

INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL | 2024年 / 20卷 / 06期

关键词：

Vision-Transformer; Multi-perspective graph convolutional neural network; 3D; Perspective image; Image entropy;

D O I：

10.24507/ijicic.20.06.1775

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The development of computer vision has made three-dimensional models play a crucial role in the field of image processing. However, compared to 2D models, 3D models have more features, making it difficult to extract features and mine correlation information between features. Based on this, this study is based on a multi-perspective graph convolutional neural network, which uses an image entropy weight pooling layer to improve the original view pooling layer. It assigns a weight based on image entropy to each perspective image, and then performs view pooling operations. The Vision-Transformer module is embedded into a multi-perspective graph convolutional neural network to mine information associations between multi-view graphs. The results show that the multi- perspective graph convolutional neural network model fused with Vision-Transformer is more concentrated in classifying features of the same category in the view graph, and there is a significant distance difference between different features. The multi-perspective graph convolutional neural network model fused with Vision-Transformer achieves accuracy of 89.0%, 92.0%, 94.0%, and mean average precision values of 80.0%, 85.0%, and 88.0% when the number of view images is 6, 10, and 14. This study improves the retrieval accuracy of 3D models and has certain reference value in the field of computer vision.

引用

页码：1775 / 1788

页数：14

共 50 条

[41] MULTI-VIEW 3D RECONSTRUCTION FROM VIDEO WITH TRANSFORMER
Zhong, Yijie
Sun, Zhengxing
Sun, Yunhan
Luo, Shoutong
Wang, Yi
Zhang, Wei
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1661 - 1665
[42] Unsupervised Feature Learning With Graph Embedding for View-Based 3D Model Retrieval
Su, Yuting
Li, Wenhui
Nie, Weizhi
Song, Dan
Liu, An-An
IEEE ACCESS, 2019, 7 : 95285 - 95296
[43] View-based 3D model retrieval with probabilistic graph model
Gao, Yue
Tang, Jinhui
Li, Haojie
Dai, Qionghai
Zhang, Naiyao
NEUROCOMPUTING, 2010, 73 (10-12) : 1900 - 1905
[44] 3D object retrieval based on multi-view convolutional neural networks
Xi-Xi Li
Qun Cao
Sha Wei
Multimedia Tools and Applications, 2017, 76 : 20111 - 20124
[45] Feature representation for 3D object retrieval based on unconstrained multi-view
Bin Zhou
Xuanyin Wang
Multimedia Systems, 2022, 28 : 1699 - 1711
[46] Feature representation for 3D object retrieval based on unconstrained multi-view
Zhou, Bin
Wang, Xuanyin
MULTIMEDIA SYSTEMS, 2022, 28 (05) : 1699 - 1711
[47] 3D object retrieval based on multi-view convolutional neural networks
Li, Xi-Xi
Cao, Qun
Wei, Sha
MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (19) : 20111 - 20124
[48] Learning-Based Bipartite Graph Matching for View-Based 3D Model Retrieval
Lu, Ke
Ji, Rongrong
Tang, Jinhui
Gao, Yue
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (10) : 4553 - 4563
[49] A Compact Multi-View Descriptor for 3D Object Retrieval
Daras, Petros
Axenopoulos, Apostolos
CBMI: 2009 INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING, 2009, : 115 - 119
[50] Cross-view Transformer for enhanced multi-view 3D reconstruction
Shi, Wuzhen
Yin, Aixue
Li, Yingxiang
Qian, Bo
VISUAL COMPUTER, 2024,

← 1 2 3 4 5 →