Multi-Scale Self-Attention for Text Classification

被引:0
|
作者
Guo, Qipeng [1 ,2 ]
Qiu, Xipeng [1 ,2 ]
Liu, Pengfei [1 ,2 ]
Xue, Xiangyang [1 ,2 ]
Zhang, Zheng [3 ,4 ]
机构
[1] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
[2] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[3] AWS Shanghai AI Lab, Shanghai, Peoples R China
[4] New York Univ Shanghai, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we introduce the prior knowledge, multi-scale structure, into self-attention modules. We propose a Multi-Scale Transformer which uses multi-scale multi-head self-attention to capture features from different scales. Based on the linguistic perspective and the analysis of pre-trained Transformer (BERT) on a huge corpus, we further design a strategy to control the scale distribution for each layer. Results of three different kinds of tasks (21 datasets) show our Multi-Scale Transformer outperforms the standard Transformer consistently and significantly on small and moderate size datasets.
引用
收藏
页码:7847 / 7854
页数:8
相关论文
共 50 条
  • [1] Multi-scale self-attention mixup for graph classification *
    Kong, Youyong
    Li, Jiaxing
    Zhang, Ke
    Wu, Jiasong
    [J]. PATTERN RECOGNITION LETTERS, 2023, 168 : 100 - 106
  • [2] Deformable Self-Attention for Text Classification
    Ma, Qianli
    Yan, Jiangyue
    Lin, Zhenxi
    Yu, Liuhong
    Chen, Zipeng
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1570 - 1581
  • [3] Multi-Scale Self-Attention Network for Denoising Medical Images
    Lee, Kyungsu
    Lee, Haeyun
    Lee, Moon Hwan
    Chang, Jin Ho
    Kuo, C. -C. Jay
    Oh, Seung-June
    Woo, Jonghye
    Hwang, Jae Youn
    [J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2023, 12 (05) : 1 - 26
  • [4] A MULTI-SCALE SELF-ATTENTION NETWORK TO DISCRIMINATE PULMONARY NODULES
    Moreno, Alejandra
    Rueda, Andrea
    Martinez, Fabio
    [J]. 2022 IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (IEEE ISBI 2022), 2022,
  • [5] Shunted Self-Attention via Multi-Scale Token Aggregation
    Ren, Sucheng
    Zhou, Daquan
    He, Shengfeng
    Feng, Jiashi
    Wang, Xinchao
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10843 - 10852
  • [6] Video Salient Object Detection Using Multi-Scale Self-Attention
    [J]. Liu, Jiahao (jiahao.liu@akane.waseda.jp), 1600, Institute of Electrical and Electronics Engineers Inc.
  • [7] MSGSA: Multi-Scale Guided Self-Attention Network for Crowd Counting
    Sun, Yange
    Li, Meng
    Guo, Huaping
    Zhang, Li
    [J]. ELECTRONICS, 2023, 12 (12)
  • [8] Crowd counting using a self-attention multi-scale cascaded network
    Li, He
    Zhang, Shihui
    Kong, Weihang
    [J]. IET COMPUTER VISION, 2019, 13 (06) : 556 - 561
  • [9] Multi-Scale Visual Semantics Aggregation with Self-Attention for End-to-End Image-Text Matching
    Zheng, Zhuobin
    Ben, Youcheng
    Yuan, Chun
    [J]. ASIAN CONFERENCE ON MACHINE LEARNING, VOL 101, 2019, 101 : 940 - 955
  • [10] A Multi-scale Convolutional Attention Based GRU Network for Text Classification
    Tang, Xianlun
    Chen, Yingjie
    Dai, Yuyan
    Xu, Jin
    Peng, Deguang
    [J]. 2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 3009 - 3013