Understanding Self-attention Mechanism via Dynamical System Perspective

被引:3
|
作者
Huang, Zhongzhan [1 ]
Liang, Mingfu [2 ]
Qin, Jinghui [3 ]
Zhong, Shanshan [1 ]
Lin, Liang [1 ]
机构
[1] Sun Yat Sen Univ, Guangzhou, Peoples R China
[2] Northwestern Univ, Evanston, IL USA
[3] Guangdong Univ Technol, Guangzhou, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
D O I
10.1109/ICCV51070.2023.00136
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The self-attention mechanism (SAM) is widely used in various fields of artificial intelligence and has successfully boosted the performance of different models. However, current explanations of this mechanism are mainly based on intuitions and experiences, while there still lacks direct modeling for how the SAM helps performance. To mitigate this issue, in this paper, based on the dynamical system perspective of the residual neural network, we first show that the intrinsic stiffness phenomenon (SP) in the high-precision solution of ordinary differential equations (ODEs) also widely exists in high-performance neural networks (NN). Thus the ability of NN to measure SP at the feature level is necessary to obtain high performance and is an important factor in the difficulty of training NN. Similar to the adaptive step-size method which is effective in solving stiff ODEs, we show that the SAM is also a stiffness-aware step size adaptor that can enhance the model's representational ability to measure intrinsic SP by refining the estimation of stiffness information and generating adaptive attention values, which provides a new understanding about why and how the SAM can benefit the model performance. This novel perspective can also explain the lottery ticket hypothesis in SAM, design new quantitative metrics of representational ability, and inspire a new theoretic-inspired approach, StepNet. Extensive experiments on several popular benchmarks demonstrate that StepNet can extract fine-grained stiffness information and measure SP accurately, leading to significant improvements in various visual tasks.
引用
下载
收藏
页码:1412 / 1422
页数:11
相关论文
共 50 条
  • [1] A Crowded Object Counting System with Self-Attention Mechanism
    Lien, Cheng-Chang
    Wu, Pei-Chen
    Sensors, 2024, 24 (20)
  • [2] FOCUS OF ATTENTION IN GROUPS - A SELF-ATTENTION PERSPECTIVE
    MULLEN, B
    CHAPMAN, JG
    PEAUGH, S
    JOURNAL OF SOCIAL PSYCHOLOGY, 1989, 129 (06): : 807 - 817
  • [3] Self-attention Enhanced Patient Journey Understanding in Healthcare System
    Peng, Xueping
    Long, Guodong
    Shen, Tao
    Wang, Sen
    Jiang, Jing
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2020, PT III, 2021, 12459 : 719 - 735
  • [4] Understanding User Preferences in Location-Based Social Networks via a Novel Self-Attention Mechanism
    Shi, Lei
    Luo, Jia
    Zhang, Peiying
    Han, Hongqi
    El Baz, Didier
    Cheng, Gang
    Liang, Zeyu
    SUSTAINABILITY, 2022, 14 (24)
  • [5] Adverse drug reaction detection via a multihop self-attention mechanism
    Zhang, Tongxuan
    Lin, Hongfei
    Ren, Yuqi
    Yang, Liang
    Xu, Bo
    Yang, Zhihao
    Wang, Jian
    Zhang, Yijia
    BMC BIOINFORMATICS, 2019, 20 (01)
  • [6] Adverse drug reaction detection via a multihop self-attention mechanism
    Tongxuan Zhang
    Hongfei Lin
    Yuqi Ren
    Liang Yang
    Bo Xu
    Zhihao Yang
    Jian Wang
    Yijia Zhang
    BMC Bioinformatics, 20
  • [7] Blockwise Self-Attention for Long Document Understanding
    Qiu, Jiezhong
    Ma, Hao
    Levy, Omer
    Yih, Wen-tau
    Wang, Sinong
    Tang, Jie
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2555 - 2565
  • [8] An Ego Network Embedding Model via Neighbors Sampling and Self-attention Mechanism
    Guo, Ziyu
    Liu, Shijun
    Pan, Li
    He, Qiang
    2020 IEEE INTL SYMP ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, INTL CONF ON BIG DATA & CLOUD COMPUTING, INTL SYMP SOCIAL COMPUTING & NETWORKING, INTL CONF ON SUSTAINABLE COMPUTING & COMMUNICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2020), 2020, : 425 - 432
  • [9] Unsupervised Pansharpening Based on Self-Attention Mechanism
    Qu, Ying
    Baghbaderani, Razieh Kaviani
    Qi, Hairong
    Kwan, Chiman
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (04): : 3192 - 3208
  • [10] Linear Complexity Randomized Self-attention Mechanism
    Zheng, Lin
    Wang, Chong
    Kong, Lingpeng
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,