Recurrent Memory Transformer

被引:0
|
作者
Bulatov, Aydar [1 ]
Kuratov, Yuri [1 ,2 ]
Burtsev, Mikhail S. [1 ,2 ]
机构
[1] Moscow Inst Phys & Technol, Neural Networks & Deep Learning Lab, Dolgoprudnyi, Russia
[2] AIRI, Moscow, Russia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations. Moreover, the length of an input sequence is limited by quadratic computational complexity of self-attention. In this work, we propose and study a memory-augmented segment-level recurrent Transformer (RMT). Memory allows to store and process local and global information as well as to pass information between segments of the long sequence with the help of recurrence. We implement a memory mechanism with no changes to Transformer model by adding special memory tokens to the input or output sequence. Then the model is trained to control both memory operations and sequence representations processing. Results of experiments show that RMT performs on par with the Transformer-XL on language modeling for smaller memory sizes and outperforms it for tasks that require longer sequence processing. We show that adding memory tokens to Tr-XL is able to improve its performance. This makes Recurrent Memory Transformer a promising architecture for applications that require learning of long-term dependencies and general purpose in memory processing, such as algorithmic tasks and reasoning.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Beyond Attention: Breaking the Limits of Transformer Context Length with Recurrent Memory
    Bulatov, Aydar
    Kuratov, Yuri
    Kapushev, Yermek
    Burtsev, Mikhail
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17700 - 17708
  • [2] MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning
    Lei, Jie
    Wang, Liwei
    Shen, Yelong
    Yu, Dong
    Berg, Tamara L.
    Bansal, Mohit
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 2603 - 2614
  • [3] BRAID TRANSFORMER MEMORY
    ALDRICH, WH
    ALONSO, RL
    [J]. IEEE TRANSACTIONS ON ELECTRONIC COMPUTERS, 1966, EC15 (04): : 502 - +
  • [4] Transformer with Memory Replay
    Liu, Rui
    Mozafari, Barzan
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7567 - 7575
  • [5] Birth of a Transformer: A Memory Viewpoint
    Bietti, Alberto
    Cabannes, Vivien
    Bouchacourt, Diane
    Jegou, Herve
    Bottou, Leon
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] Recurrent Transformer Networks for Semantic Correspondence
    Kim, Seungryong
    Lin, Stephen
    Jeon, Sangryul
    Min, Dongbo
    Sohn, Kwanghoon
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [7] ∞-former: Infinite Memory Transformer
    Martins, Pedro Henrique
    Marinho, Zita
    Martins, Andre F. T.
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 5468 - 5485
  • [8] Visual Transformer based on a recurrent structure
    Jiang, Lei
    Wang, Zi-Qi
    Cui, Zhen-Yu
    Chang, Zhi-Yong
    Shi, Xiao-Hu
    [J]. Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2024, 54 (07): : 2049 - 2056
  • [9] PersonalTM: Transformer Memory for Personalized Retrieval
    Lian, Ruixue
    Lu, Sixing
    Solomon, Clint
    Aguilar, Gustavo
    Ponnusamy, Pragaash
    Han, Jialong
    Ma, Chengyuan
    Guo, Chenlei
    [J]. PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 2256 - 2260
  • [10] Transformer Memory as a Differentiable Search Index
    Tay, Yi
    Tran, Vinh Q.
    Dehghani, Mostafa
    Ni, Jianmo
    Bahri, Dara
    Mehta, Harsh
    Qin, Zhen
    Hui, Kai
    Zhao, Zhe
    Gupta, Jai
    Schuster, Tal
    Cohen, WilliamW.
    Metzler, Donald
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,