Long-Term Recurrent Convolutional Networks for Visual Recognition and Description

被引:827
|
作者
Donahue, Jeff [1 ]
Hendricks, Lisa Anne [1 ]
Rohrbach, Marcus [1 ,2 ]
Venugopalan, Subhashini [3 ]
Guadarrama, Sergio [1 ]
Saenko, Kate [4 ]
Darrell, Trevor [1 ,2 ]
机构
[1] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA
[2] Int Comp Sci Inst, Berkeley, CA 94720 USA
[3] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
[4] Univ Massachusetts Lowell, Dept Comp Sci, Lowell, MA 01852 USA
关键词
Computer vision; convolutional nets; deep learning; transfer learning;
D O I
10.1109/TPAMI.2016.2599174
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent are effective for tasks involving sequences, visual and otherwise. We describe a class of recurrent convolutional architectures which is end-to-end trainable and suitable for large-scale visual understanding tasks, and demonstrate the value of these models for activity recognition, image captioning, and video description. In contrast to previous models which assume a fixed visual representation or perform simple temporal averaging for sequential processing, recurrent convolutional models are "doubly deep" in that they learn compositional representations in space and time. Learning long-term dependencies is possible when nonlinearities are incorporated into the network state updates. Differentiable recurrent models are appealing in that they can directly map variable-length inputs (e.g., videos) to variable-length outputs (e.g., natural language text) and can model complex temporal dynamics; yet they can be optimized with backpropagation. Our recurrent sequence models are directly connected to modern visual convolutional network models and can be jointly trained to learn temporal dynamics and convolutional perceptual representations. Our results show that such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined or optimized.
引用
收藏
页码:677 / 691
页数:15
相关论文
共 50 条
  • [1] Long-term Recurrent Convolutional Networks for Visual Recognition and Description
    Donahue, Jeff
    Hendricks, Lisa Anne
    Guadarrama, Sergio
    Rohrbach, Marcus
    Venugopalan, Subhashini
    Saenko, Kate
    Darrell, Trevor
    [J]. 2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 2625 - 2634
  • [2] Online Learning Engagement Recognition Using Bidirectional Long-Term Recurrent Convolutional Networks
    Ma, Yujian
    Wei, Yantao
    Shi, Yafei
    Li, Xiuhan
    Tian, Yi
    Zhao, Zhongjin
    [J]. SUSTAINABILITY, 2023, 15 (01)
  • [3] ARCH: Adaptive recurrent-convolutional hybrid networks for long-term action recognition
    Xin, Miao
    Zhang, Hong
    Wang, Helong
    Sun, Mingui
    Yuan, Ding
    [J]. NEUROCOMPUTING, 2016, 178 : 87 - 102
  • [4] Synthesizing Dynamic MRI Using Long-Term Recurrent Convolutional Networks
    Preiswerk, Frank
    Cheng, Cheng-Chieh
    Luo, Jie
    Madore, Bruno
    [J]. MACHINE LEARNING IN MEDICAL IMAGING: 9TH INTERNATIONAL WORKSHOP, MLMI 2018, 2018, 11046 : 89 - 97
  • [5] Long-term recurrent convolutional network violent Behaviour recognition with attention mechanism
    Liang, Qiming
    Li, Yong
    Yang, Kaikai
    Wang, Xipeng
    Li, Zhi
    [J]. 2020 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE COMMUNICATION AND NETWORK SECURITY (CSCNS2020), 2021, 336
  • [6] Automated Lubrication Systems Prognostics Using Long-Term Recurrent Convolutional Networks
    Warner, Chloe
    Desmet, Antoine
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON PROGNOSTICS AND HEALTH MANAGEMENT (ICPHM), 2018,
  • [7] Capturing Long-term Temporal Dependencies with Convolutional Networks for Continuous Emotion Recognition
    Khorram, Soheil
    Aldeneh, Zakaria
    Dimitriadis, Dimitrios
    McInnis, Melvin
    Provost, Emily Mower
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1253 - 1257
  • [8] Enriched Long-term Recurrent Convolutional Network for Facial Micro-Expression Recognition
    Khor, Huai-Qian
    See, John
    Phan, Raphael C. W.
    Lin, Weiyao
    [J]. PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, : 667 - 674
  • [9] Multi-Directional Long-Term Recurrent Convolutional Network for Road Situation Recognition
    Dofitas Jr, Cyreneo
    Gil, Joon-Min
    Byun, Yung-Cheol
    [J]. SENSORS, 2024, 24 (14)
  • [10] Long-term Visual Place Recognition
    Alijani, Farid
    Peltomaki, Jukka
    Puura, Jussi
    Huttunen, Heikki
    Kamarainen, Joni-Kristian
    Rahtu, Esa
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3422 - 3428