Integrated crossing pooling of representation learning for Vision Transformer

被引：0

作者：

Xu, Libo ^{[1
]}

Li, Xingsen ^{[2
]}

Huang, Zhenrui ^{[1
]}

Sun, Yucheng ^{[3
]}

Wang, Jiagong ^{[1
]}

机构：

[1] NingboTech Univ, Ningbo, Peoples R China

[2] Guangdong Univ Technol, Guangzhou, Peoples R China

[3] China E Port Data Ctr, Ningbo Branch, Ningbo, Peoples R China

来源：

PROCEEDINGS OF 2021 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY WORKSHOPS AND SPECIAL SESSIONS: (WI-IAT WORKSHOP/SPECIAL SESSION 2021) | 2021年

关键词：

vision transformer; ViT; Pooling method; class token;

D O I：

10.1145/3498851.3499004

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, transformer technology such as ViT, has been widely developed in the field of computer vision. In the ViT model, a learnable class token parameter is added to the head of the token sequence. The output of the class token through the whole transformer encoder is looked as the final representation vector, which is then passed through a multi-layer perception (MLP) network to get the classification prediction. The class token can be seen as an information aggregation of all other tokens. But we consider that the global pooling of tokens can aggregate information more effective and intuitive. In the paper, we propose a new pooling method, called cross pooling, to replace class token to obtain representation vector of the input image, which can extract better features and effectively improve model performance without increasing the computational cost. Through extensive experiments, we demonstrate that cross pooling methods achieve significant improvement over the original class token and existing global pooling methods such as average pooling or maximum pooling.

引用

页码：491 / 496

页数：6

共 50 条

[21] Graph pooling for graph-level representation learning: a survey
[J]. Huang, De-Shuang (dshuang@tongji.edu.cn), 2025, 58 (02)
[22] Online Continual Learning with Contrastive Vision Transformer
Wang, Zhen
Liu, Liu
Kong, Yajing
Guo, Jiaxian
Tao, Dacheng
[J]. COMPUTER VISION, ECCV 2022, PT XX, 2022, 13680 : 631 - 650
[23] Binary representation learning in computer vision
Shen, Fumin
Yang, Yang
Zhang, Hanwang
[J]. NEUROCOMPUTING, 2016, 213 : 1 - 4
[24] UGTransformer: Unsupervised Graph Transformer Representation Learning
Xu, Lixiang
Liu, Haifeng
Cui, Qingzhe
Luo, Bin
Li, Ning
Chen, Yan
Tang, Yuanyan
[J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[25] Graph Propagation Transformer for Graph Representation Learning
Chen, Zhe
Tan, Hao
Wang, Tao
Shen, Tianrun
Lu, Tong
Peng, Qiuying
Cheng, Cheng
Qi, Yue
[J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 3559 - 3567
[26] Transformer-Exclusive Cross-Modal Representation for Vision and Language
Shin, Andrew
Narihira, Takuya
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2719 - 2725
[27] CommPOOL: An interpretable graph pooling framework for hierarchical graph representation learning
Tang, Haoteng
Ma, Guixiang
He, Lifang
Huang, Heng
Zhan, Liang
[J]. NEURAL NETWORKS, 2021, 143 : 669 - 677
[28] DIPool: Degree-Induced Pooling for Hierarchical Graph Representation Learning
Yu, Hualei
Yao, Yirong
Yuan, Jinliang
Wang, Chongjun
[J]. 2022 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING, ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM, 2022, : 220 - 227
[29] DMSPool: Dual Multi-Scale Pooling for Graph Representation Learning
Yu, Hualei
Luo, Chong
Du, Yuntao
Cheng, Hao
Cao, Meng
Wang, Chongjun
[J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT I, 2021, 12681 : 375 - 384
[30] An Attention Pooling based Representation Learning Method for Speech Emotion Recognition
Li, Pengcheng
Song, Yan
McLoughlin, Ian
Guo, Wu
Dai, Lirong
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3087 - 3091

← 1 2 3 4 5 →