Representation Learning Based on Vision Transformer

被引:0
|
作者
Ran, Ruisheng [1 ]
Gao, Tianyu [1 ]
Hu, Qianwei [2 ]
Zhang, Wenfeng [1 ]
Peng, Shunshun [1 ]
Fang, Bin [3 ]
机构
[1] Chongqing Normal Univ, Coll Comp & Informat Sci, Chongqing 401331, Peoples R China
[2] Chongqing Dinghui Informat Technol Co Ltd, Chongqing 401147, Peoples R China
[3] Chongqing Univ, Coll Comp Sci, Chongqing 400044, Peoples R China
基金
中国国家自然科学基金;
关键词
Representation learning; Transformer; data visualization; image reconstruction; zero-shot learning; DEEP; DIMENSIONALITY; NETWORK;
D O I
10.1142/S0218001424590043
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, with the rapid development of information technology, the volume of image data has grown exponentially. However, these datasets typically contain a large amount of redundant information. To extract effective features and reduce redundancy from images, a representation learning method based on the Vision Transformer (ViT) has been proposed, and to our best knowledge, Transformer was first applied to zero-shot learning (ZSL). The method adopts a symmetric encoder-decoder structure, where the encoder incorporates Multi-Head Self-Attention (MSA) mechanism of ViT to reduce the dimensionality of image features, eliminate redundant information, and decrease computational burden. Consequently, it effectively extracts features, and the decoder is utilized for reconstructing image data. We evaluated the representation learning capability of the proposed method in various tasks, including data visualization, image reconstruction, face recognition, and ZSL. By comparing with state-of-the-art representation learning methods, the outstanding results obtained validate the effectiveness of this method in the field of representation learning.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Integrated crossing pooling of representation learning for Vision Transformer
    Xu, Libo
    Li, Xingsen
    Huang, Zhenrui
    Sun, Yucheng
    Wang, Jiagong
    [J]. PROCEEDINGS OF 2021 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY WORKSHOPS AND SPECIAL SESSIONS: (WI-IAT WORKSHOP/SPECIAL SESSION 2021), 2021, : 491 - 496
  • [2] An Intrusion Detection System Using Vision Transformer for Representation Learning
    Ban, Xinbo
    Liu, Ao
    He, Long
    Gong, Li
    [J]. FRONTIERS IN CYBER SECURITY, FCS 2023, 2024, 1992 : 531 - 544
  • [3] Image Retrieval Based on Vision Transformer and Masked Learning
    李锋
    潘煌圣
    盛守祥
    王国栋
    [J]. Journal of Donghua University(English Edition), 2023, 40 (05) : 539 - 547
  • [4] Video captioning based on vision transformer and reinforcement learning
    Zhao, Hong
    Chen, Zhiwen
    Guo, Lan
    Han, Zeyu
    [J]. PeerJ Computer Science, 2022, 8
  • [5] Video captioning based on vision transformer and reinforcement learning
    Zhao, Hong
    Chen, Zhiwen
    Guo, Lan
    Han, Zeyu
    [J]. PEERJ COMPUTER SCIENCE, 2022, 8
  • [6] CONTINUAL LEARNING IN VISION TRANSFORMER
    Takeda, Mana
    Yanai, Keiji
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 616 - 620
  • [7] Learning 3D Face Representation with Vision Transformer for Masked Face Recognition
    Wang, Yuan
    Yang, Zhen
    Zhang, Zhiqiang
    Zang, Huaijuan
    Zhu, Qiang
    Zhan, Shu
    [J]. 2022 ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING (CACML 2022), 2022, : 505 - 511
  • [8] Vision-Transformer-Based Transfer Learning for Mammogram Classification
    Ayana, Gelan
    Dese, Kokeb
    Dereje, Yisak
    Kebede, Yonas
    Barki, Hika
    Amdissa, Dechassa
    Husen, Nahimiya
    Mulugeta, Fikadu
    Habtamu, Bontu
    Choe, Se-Woon
    [J]. DIAGNOSTICS, 2023, 13 (02)
  • [9] A dynamic graph representation learning based on temporal graph transformer
    Zhong, Ying
    Huang, Chenze
    [J]. ALEXANDRIA ENGINEERING JOURNAL, 2023, 63 : 359 - 369
  • [10] Transformer-Based Representation Learning on Temporal Heterogeneous Graphs
    Li, Longhai
    Duan, Lei
    Wang, Junchen
    Xie, Guicai
    He, Chengxin
    Chen, Zihao
    Deng, Song
    [J]. WEB AND BIG DATA, PT II, APWEB-WAIM 2022, 2023, 13422 : 385 - 400