Integrated crossing pooling of representation learning for Vision Transformer

被引：0

作者：

Xu, Libo ^{[1
]}

Li, Xingsen ^{[2
]}

Huang, Zhenrui ^{[1
]}

Sun, Yucheng ^{[3
]}

Wang, Jiagong ^{[1
]}

机构：

[1] NingboTech Univ, Ningbo, Peoples R China

[2] Guangdong Univ Technol, Guangzhou, Peoples R China

[3] China E Port Data Ctr, Ningbo Branch, Ningbo, Peoples R China

来源：

PROCEEDINGS OF 2021 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY WORKSHOPS AND SPECIAL SESSIONS: (WI-IAT WORKSHOP/SPECIAL SESSION 2021) | 2021年

关键词：

vision transformer; ViT; Pooling method; class token;

D O I：

10.1145/3498851.3499004

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, transformer technology such as ViT, has been widely developed in the field of computer vision. In the ViT model, a learnable class token parameter is added to the head of the token sequence. The output of the class token through the whole transformer encoder is looked as the final representation vector, which is then passed through a multi-layer perception (MLP) network to get the classification prediction. The class token can be seen as an information aggregation of all other tokens. But we consider that the global pooling of tokens can aggregate information more effective and intuitive. In the paper, we propose a new pooling method, called cross pooling, to replace class token to obtain representation vector of the input image, which can extract better features and effectively improve model performance without increasing the computational cost. Through extensive experiments, we demonstrate that cross pooling methods achieve significant improvement over the original class token and existing global pooling methods such as average pooling or maximum pooling.

引用

页码：491 / 496

页数：6

共 50 条

[1] Representation Learning Based on Vision Transformer
Ran, Ruisheng
Gao, Tianyu
Hu, Qianwei
Zhang, Wenfeng
Peng, Shunshun
Fang, Bin
[J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2024, 38 (07)
[2] Simultaneous instance pooling and bag representation selection approach for multiple-instance learning (MIL) using vision transformer
Waqas, Muhammad
Tahir, Muhammad Atif
Author, Muhammad Danish
Al-Maadeed, Sumaya
Bouridane, Ahmed
Wu, Jia
[J]. NEURAL COMPUTING & APPLICATIONS, 2024, 36 (12): : 6659 - 6680
[3] Simultaneous instance pooling and bag representation selection approach for multiple-instance learning (MIL) using vision transformer
Muhammad Waqas
Muhammad Atif Tahir
Muhammad Danish Author
Sumaya Al-Maadeed
Ahmed Bouridane
Jia Wu
[J]. Neural Computing and Applications, 2024, 36 : 6659 - 6680
[4] An Intrusion Detection System Using Vision Transformer for Representation Learning
Ban, Xinbo
Liu, Ao
He, Long
Gong, Li
[J]. FRONTIERS IN CYBER SECURITY, FCS 2023, 2024, 1992 : 531 - 544
[5] Sparsifying Transformer Models with Trainable Representation Pooling
Pietruszka, Michal
Borchmann, Lukasz
Garncarek, Lukasz
[J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 8616 - 8633
[6] Vision Transformer With Attentive Pooling for Robust Facial Expression Recognition
Xue, Fanglei
Wang, Qiangchang
Tan, Zichang
Ma, Zhongsong
Guo, Guodong
[J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (04) : 3244 - 3256
[7] CONTINUAL LEARNING IN VISION TRANSFORMER
Takeda, Mana
Yanai, Keiji
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 616 - 620
[8] Learning 3D Face Representation with Vision Transformer for Masked Face Recognition
Wang, Yuan
Yang, Zhen
Zhang, Zhiqiang
Zang, Huaijuan
Zhu, Qiang
Zhan, Shu
[J]. 2022 ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING (CACML 2022), 2022, : 505 - 511
[9] Video Representation Learning Using Discriminative Pooling
Wang, Jue
Cherian, Anoop
Porikli, Fatih
Gould, Stephen
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1149 - 1158
[10] Hierarchical Graph Representation Learning with Differentiable Pooling
Ying, Rex
You, Jiaxuan
Morris, Christopher
Ren, Xiang
Hamilton, William L.
Leskovec, Jure
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31

← 1 2 3 4 5 →