Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network

被引:1820
|
作者
Sherstinsky, Alex
机构
关键词
RNN; RNN unfolding/unrolling; LSTM; External input gate; Convolutional input context windows; BACKPROPAGATION;
D O I
10.1016/j.physd.2019.132306
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Because of their effectiveness in broad practical applications, LSTM networks have received a wealth of coverage in scientific journals, technical blogs, and implementation guides. However, in most articles, the inference formulas for the LSTM network and its parent, RNN, are stated axiomatically, while the training formulas are omitted altogether. In addition, the technique of "unrolling'' an RNN is routinely presented without justification throughout the literature. The goal of this tutorial is to explain the essential RNN and LSTM fundamentals in a single document. Drawing from concepts in Signal Processing, we formally derive the canonical RNN formulation from differential equations. We then propose and prove a precise statement, which yields the RNN unrolling technique. We also review the difficulties with training the standard RNN and address them by transforming the RNN into the "Vanilla LSTM''1 network through a series of logical arguments. We provide all equations pertaining to the LSTM system together with detailed descriptions of its constituent entities. Albeit unconventional, our choice of notation and the method for presenting the LSTM system emphasizes ease of understanding. As part of the analysis, we identify new opportunities to enrich the LSTM system and incorporate these extensions into the Vanilla LSTM network, producing the most general LSTM variant to date. The target reader has already been exposed to RNNs and LSTM networks through numerous available resources and is open to an alternative pedagogical approach. A Machine Learning practitioner seeking guidance for implementing our new augmented LSTM model in software for experimentation and research will find the insights and derivations in this treatise valuable as well. (C) 2019 Elsevier B.V. All rights reserved.
引用
下载
收藏
页数:28
相关论文
共 50 条
  • [41] A Deep Neural Network Model for Short-Term Load Forecast Based on Long Short-Term Memory Network and Convolutional Neural Network
    Tian, Chujie
    Ma, Jian
    Zhang, Chunhong
    Zhan, Panpan
    ENERGIES, 2018, 11 (12)
  • [42] Short-Term Residential Load Forecasting Based on LSTM Recurrent Neural Network
    Kong, Weicong
    Dong, Zhao Yang
    Jia, Youwei
    Hill, David J.
    Xu, Yan
    Zhang, Yuan
    IEEE TRANSACTIONS ON SMART GRID, 2019, 10 (01) : 841 - 851
  • [43] Short-term neural network memory
    Morris, Robert J.T.
    Wong, Wing Shing
    SIAM Journal on Computing, 1988, 17 (06): : 1103 - 1118
  • [44] A SHORT-TERM NEURAL NETWORK MEMORY
    MORRIS, RJT
    WONG, WS
    SIAM JOURNAL ON COMPUTING, 1988, 17 (06) : 1103 - 1118
  • [45] Application of Long Short-Term Memory (LSTM) Neural Network for the estimation of communication network delay in smart grid applications
    Feizimirkhani, Ronak
    Van Hoa Nguyen
    Besanger, Yvon
    Quoc Tuan Tran
    Bratcu, Antoneta Iuliana
    Labonne, Antoine
    Braconnier, Thierry
    2021 21ST IEEE INTERNATIONAL CONFERENCE ON ENVIRONMENT AND ELECTRICAL ENGINEERING AND 2021 5TH IEEE INDUSTRIAL AND COMMERCIAL POWER SYSTEMS EUROPE (EEEIC/I&CPS EUROPE), 2021,
  • [46] An FPGA Implementation of a Long Short-Term Memory Neural Network
    Ferreira, Joao Canas
    Fonseca, Jose
    2016 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG16), 2016,
  • [47] Long short-term memory neural network for glucose prediction
    Carrillo-Moreno, Jaime
    Perez-Gandia, Carmen
    Sendra-Arranz, Rafael
    Garcia-Saez, Gema
    Hernando, M. Elena
    Gutierrez, Alvaro
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (09): : 4191 - 4203
  • [48] Long short-term memory neural network for glucose prediction
    Jaime Carrillo-Moreno
    Carmen Pérez-Gandía
    Rafael Sendra-Arranz
    Gema García-Sáez
    M. Elena Hernando
    Álvaro Gutiérrez
    Neural Computing and Applications, 2021, 33 : 4191 - 4203
  • [49] Recurrent Network Model of the Neural Mechanism of Short-Term Active Memory
    Zipser, David
    NEURAL COMPUTATION, 1991, 3 (02) : 179 - 193
  • [50] Short-term memory for serial order: A recurrent neural network model
    Botvinick, MM
    Plaut, DC
    PSYCHOLOGICAL REVIEW, 2006, 113 (02) : 201 - 233