Learning deep spatiotemporal features for video captioning

被引:9
|
作者
Daskalakis, Eleftherios [1 ]
Tzelepi, Maria [1 ]
Tefas, Anastasios [1 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki, Greece
关键词
D O I
10.1016/j.patrec.2018.09.022
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel automatic video captioning system which translates videos to sentences, utilizing a deep neural network that is composed of three building parts of convolutional and recurrent structure. That is, the first subnetwork operates as feature extractor of single frames. The second subnetwork is a three-stream network, capable of capturing spatial semantic information in the first stream, temporal semantic information in the second stream, and global video concept information in the third stream. The third subnetwork generates relevant textual captions using as input the spatiotemporal features of the second subnetwork. The experimental validation indicates the effectiveness of the proposed model, achieving superior performance over competitive methods. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:143 / 149
页数:7
相关论文
共 50 条
  • [31] Learning Video-Text Aligned Representations for Video Captioning
    Shi, Yaya
    Xu, Haiyang
    Yuan, Chunfeng
    Li, Bing
    Hu, Weiming
    Zha, Zheng-Jun
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (02)
  • [32] Video Captioning via Hierarchical Reinforcement Learning
    Wang, Xin
    Chen, Wenhu
    Wu, Jiawei
    Wang, Yuan-Fang
    Wang, William Yang
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4213 - 4222
  • [33] Image Captioning Using Deep Learning
    Adithya, Paluvayi Veera
    Kalidindi, Mourya Viswanadh
    Swaroop, Nallani Jyothi
    Vishwas, H. N.
    [J]. ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2023, PT III, 2024, 2092 : 42 - 58
  • [34] Image Captioning using Deep Learning
    Jain, Yukti Sanjay
    Dhopeshwar, Tanisha
    Chadha, Supreet Kaur
    Pagire, Vrushali
    [J]. 2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL PERFORMANCE EVALUATION (COMPE-2021), 2021,
  • [35] AENet: Learning Deep Audio Features for Video Analysis
    Takahashi, Naoya
    Gygli, Michael
    Van Gool, Luc
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (03) : 513 - 524
  • [36] DIABETIC RETINOPATHY DETECTION AND CAPTIONING BASED ON LESION FEATURES USING DEEP LEARNING APPROACH
    Amalia, Rizka
    Bustamam, Alhadi
    Yudantha, Anggun Rama
    Victor, Andi Arus
    [J]. COMMUNICATIONS IN MATHEMATICAL BIOLOGY AND NEUROSCIENCE, 2021,
  • [37] Semi-Supervised Learning for Video Captioning
    Lin, Ke
    Gan, Zhuoxin
    Wang, Liwei
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1096 - 1106
  • [38] Learning Hierarchical Modular Networks for Video Captioning
    Li, Guorong
    Ye, Hanhua
    Qi, Yuankai
    Wang, Shuhui
    Qing, Laiyun
    Huang, Qingming
    Yang, Ming-Hsuan
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (02) : 1049 - 1064
  • [39] Video-Based Depression Level Analysis by Encoding Deep Spatiotemporal Features
    Al Jazaery, Mohamad
    Guo, Guodong
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2021, 12 (01) : 262 - 268
  • [40] SEMANTIC LEARNING NETWORK FOR CONTROLLABLE VIDEO CAPTIONING
    Chen, Kaixuan
    Di, Qianji
    Lu, Yang
    Wang, Hanzi
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 880 - 884