Representation Learning Based on Vision Transformer

被引：0

作者：

Ran, Ruisheng ^{[1
]}

Gao, Tianyu ^{[1
]}

Hu, Qianwei ^{[2
]}

Zhang, Wenfeng ^{[1
]}

Peng, Shunshun ^{[1
]}

Fang, Bin ^{[3
]}

机构：

[1] Chongqing Normal Univ, Coll Comp & Informat Sci, Chongqing 401331, Peoples R China

[2] Chongqing Dinghui Informat Technol Co Ltd, Chongqing 401147, Peoples R China

[3] Chongqing Univ, Coll Comp Sci, Chongqing 400044, Peoples R China

来源：

INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE | 2024年 / 38卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Representation learning; Transformer; data visualization; image reconstruction; zero-shot learning; DEEP; DIMENSIONALITY; NETWORK;

D O I：

10.1142/S0218001424590043

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, with the rapid development of information technology, the volume of image data has grown exponentially. However, these datasets typically contain a large amount of redundant information. To extract effective features and reduce redundancy from images, a representation learning method based on the Vision Transformer (ViT) has been proposed, and to our best knowledge, Transformer was first applied to zero-shot learning (ZSL). The method adopts a symmetric encoder-decoder structure, where the encoder incorporates Multi-Head Self-Attention (MSA) mechanism of ViT to reduce the dimensionality of image features, eliminate redundant information, and decrease computational burden. Consequently, it effectively extracts features, and the decoder is utilized for reconstructing image data. We evaluated the representation learning capability of the proposed method in various tasks, including data visualization, image reconstruction, face recognition, and ZSL. By comparing with state-of-the-art representation learning methods, the outstanding results obtained validate the effectiveness of this method in the field of representation learning.

引用

页数：23

共 50 条

[1] Integrated crossing pooling of representation learning for Vision Transformer
Xu, Libo
Li, Xingsen
Huang, Zhenrui
Sun, Yucheng
Wang, Jiagong
[J]. PROCEEDINGS OF 2021 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY WORKSHOPS AND SPECIAL SESSIONS: (WI-IAT WORKSHOP/SPECIAL SESSION 2021), 2021, : 491 - 496
[2] An Intrusion Detection System Using Vision Transformer for Representation Learning
Ban, Xinbo
Liu, Ao
He, Long
Gong, Li
[J]. FRONTIERS IN CYBER SECURITY, FCS 2023, 2024, 1992 : 531 - 544
[3] Image Retrieval Based on Vision Transformer and Masked Learning
李锋
潘煌圣
盛守祥
王国栋
[J]. Journal of Donghua University(English Edition), 2023, 40 (05) : 539 - 547
[4] Video captioning based on vision transformer and reinforcement learning
Zhao, Hong
Chen, Zhiwen
Guo, Lan
Han, Zeyu
[J]. PeerJ Computer Science, 2022, 8
[5] Video captioning based on vision transformer and reinforcement learning
Zhao, Hong
Chen, Zhiwen
Guo, Lan
Han, Zeyu
[J]. PEERJ COMPUTER SCIENCE, 2022, 8
[6] CONTINUAL LEARNING IN VISION TRANSFORMER
Takeda, Mana
Yanai, Keiji
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 616 - 620
[7] Learning 3D Face Representation with Vision Transformer for Masked Face Recognition
Wang, Yuan
Yang, Zhen
Zhang, Zhiqiang
Zang, Huaijuan
Zhu, Qiang
Zhan, Shu
[J]. 2022 ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING (CACML 2022), 2022, : 505 - 511
[8] Vision-Transformer-Based Transfer Learning for Mammogram Classification
Ayana, Gelan
Dese, Kokeb
Dereje, Yisak
Kebede, Yonas
Barki, Hika
Amdissa, Dechassa
Husen, Nahimiya
Mulugeta, Fikadu
Habtamu, Bontu
Choe, Se-Woon
[J]. DIAGNOSTICS, 2023, 13 (02)
[9] A dynamic graph representation learning based on temporal graph transformer
Zhong, Ying
Huang, Chenze
[J]. ALEXANDRIA ENGINEERING JOURNAL, 2023, 63 : 359 - 369
[10] Transformer-Based Representation Learning on Temporal Heterogeneous Graphs
Li, Longhai
Duan, Lei
Wang, Junchen
Xie, Guicai
He, Chengxin
Chen, Zihao
Deng, Song
[J]. WEB AND BIG DATA, PT II, APWEB-WAIM 2022, 2023, 13422 : 385 - 400

← 1 2 3 4 5 →