Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models

被引:0
|
作者
Chang, Tyler A. [1 ,2 ]
Xu, Yifan [1 ]
Xu, Weijian [1 ]
Tu, Zhuowen [1 ]
机构
[1] Univ Calif San Diego, La Jolla, CA 92093 USA
[2] Halicioglu Data Sci Inst, La Jolla, CA 92093 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we detail the relationship between convolutions and self-attention in natural language tasks. We show that relative position embeddings in self-attention layers are equivalent to recently-proposed dynamic lightweight convolutions, and we consider multiple new ways of integrating convolutions into Transformer self-attention. Specifically, we propose composite attention, which unites previous relative position embedding methods under a convolutional framework. We conduct experiments by training BERT with composite attention, finding that convolutions consistently improve performance on multiple downstream tasks, replacing absolute position embeddings. To inform future work, we present results comparing lightweight convolutions, dynamic convolutions, and depthwise-separable convolutions in language model pretraining, considering multiple injection points for convolutions in self-attention layers.
引用
收藏
页码:4322 / 4333
页数:12
相关论文
共 50 条
  • [1] Causal Interpretation of Self-Attention in Pre-Trained Transformers
    Rohekar, Raanan Y.
    Gurwicz, Yaniv
    Nisimov, Shami
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [2] Self-conditioning Pre-Trained Language Models
    Suau, Xavier
    Zappella, Luca
    Apostoloff, Nicholas
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [3] Interpreting Art by Leveraging Pre-Trained Models
    Penzel, Niklas
    Denzler, Joachim
    [J]. 2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA, 2023,
  • [4] ARC: A Layer Replacement Compression Method Based on Fine-Grained Self-Attention Distillation for Compressing Pre-Trained Language Models
    Yu, Daohan
    Qiu, Liqing
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024,
  • [5] Pre-Trained Language Models and Their Applications
    Wang, Haifeng
    Li, Jiwei
    Wu, Hua
    Hovy, Eduard
    Sun, Yu
    [J]. ENGINEERING, 2023, 25 : 51 - 65
  • [6] Integrating the Pre-trained Item Representations with Reformed Self-attention Network for Sequential Recommendation
    Liang, Guanzhong
    Liao, Jie
    Zhou, Wei
    Wen, Junhao
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES (IEEE ICWS 2022), 2022, : 27 - 36
  • [7] Annotating Columns with Pre-trained Language Models
    Suhara, Yoshihiko
    Li, Jinfeng
    Li, Yuliang
    Zhang, Dan
    Demiralp, Cagatay
    Chen, Chen
    Tan, Wang-Chiew
    [J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 1493 - 1503
  • [8] LaoPLM: Pre-trained Language Models for Lao
    Lin, Nankai
    Fu, Yingwen
    Yang, Ziyu
    Chen, Chuwei
    Jiang, Shengyi
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6506 - 6512
  • [9] PhoBERT: Pre-trained language models for Vietnamese
    Dat Quoc Nguyen
    Anh Tuan Nguyen
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1037 - 1042
  • [10] HinPLMs: Pre-trained Language Models for Hindi
    Huang, Xixuan
    Lin, Nankai
    Li, Kexin
    Wang, Lianxi
    Gan, Suifu
    [J]. 2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, : 241 - 246