Representation Learning Based on Vision Transformer

被引:0
|
作者
Ran, Ruisheng [1 ]
Gao, Tianyu [1 ]
Hu, Qianwei [2 ]
Zhang, Wenfeng [1 ]
Peng, Shunshun [1 ]
Fang, Bin [3 ]
机构
[1] Chongqing Normal Univ, Coll Comp & Informat Sci, Chongqing 401331, Peoples R China
[2] Chongqing Dinghui Informat Technol Co Ltd, Chongqing 401147, Peoples R China
[3] Chongqing Univ, Coll Comp Sci, Chongqing 400044, Peoples R China
基金
中国国家自然科学基金;
关键词
Representation learning; Transformer; data visualization; image reconstruction; zero-shot learning; DEEP; DIMENSIONALITY; NETWORK;
D O I
10.1142/S0218001424590043
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, with the rapid development of information technology, the volume of image data has grown exponentially. However, these datasets typically contain a large amount of redundant information. To extract effective features and reduce redundancy from images, a representation learning method based on the Vision Transformer (ViT) has been proposed, and to our best knowledge, Transformer was first applied to zero-shot learning (ZSL). The method adopts a symmetric encoder-decoder structure, where the encoder incorporates Multi-Head Self-Attention (MSA) mechanism of ViT to reduce the dimensionality of image features, eliminate redundant information, and decrease computational burden. Consequently, it effectively extracts features, and the decoder is utilized for reconstructing image data. We evaluated the representation learning capability of the proposed method in various tasks, including data visualization, image reconstruction, face recognition, and ZSL. By comparing with state-of-the-art representation learning methods, the outstanding results obtained validate the effectiveness of this method in the field of representation learning.
引用
收藏
页数:23
相关论文
共 50 条
  • [21] A Transformer-based Framework for Multivariate Time Series Representation Learning
    Zerveas, George
    Jayaraman, Srideepika
    Patel, Dhaval
    Bhamidipaty, Anuradha
    Eickhoff, Carsten
    [J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 2114 - 2124
  • [22] Vision Transformer Adapters for Generalizable Multitask Learning
    Bhattacharjee, Deblina
    Susstrunk, Sabine
    Salzmann, Mathieu
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18969 - 18980
  • [23] Online Continual Learning with Contrastive Vision Transformer
    Wang, Zhen
    Liu, Liu
    Kong, Yajing
    Guo, Jiaxian
    Tao, Dacheng
    [J]. COMPUTER VISION, ECCV 2022, PT XX, 2022, 13680 : 631 - 650
  • [24] Binary representation learning in computer vision
    Shen, Fumin
    Yang, Yang
    Zhang, Hanwang
    [J]. NEUROCOMPUTING, 2016, 213 : 1 - 4
  • [25] A Multitask Learning-Based Vision Transformer for Plant Disease Localization and Classification
    Hemalatha, S.
    Jayachandran, Jai Jaganath Babu
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2024, 17 (01)
  • [26] UGTransformer: Unsupervised Graph Transformer Representation Learning
    Xu, Lixiang
    Liu, Haifeng
    Cui, Qingzhe
    Luo, Bin
    Li, Ning
    Chen, Yan
    Tang, Yuanyan
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [27] Graph Propagation Transformer for Graph Representation Learning
    Chen, Zhe
    Tan, Hao
    Wang, Tao
    Shen, Tianrun
    Lu, Tong
    Peng, Qiuying
    Cheng, Cheng
    Qi, Yue
    [J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 3559 - 3567
  • [28] A novel selective learning based transformer encoder architecture with enhanced word representation
    Ansar, Wazib
    Goswami, Saptarsi
    Chakrabarti, Amlan
    Chakraborty, Basabi
    [J]. APPLIED INTELLIGENCE, 2023, 53 (08) : 9424 - 9443
  • [29] A novel selective learning based transformer encoder architecture with enhanced word representation
    Wazib Ansar
    Saptarsi Goswami
    Amlan Chakrabarti
    Basabi Chakraborty
    [J]. Applied Intelligence, 2023, 53 : 9424 - 9443
  • [30] Molecular representation learning based on Transformer with fixed-length padding method
    Wu, Yichu
    Yang, Yang
    Zhang, Ruimeng
    Chen, Zijian
    Jin, Meichen
    Zou, Yi
    Wang, Zhonghua
    Wu, Fanhong
    [J]. JOURNAL OF MOLECULAR STRUCTURE, 2025, 1319