Deep Learning for Video Captioning: A Review

被引:0
|
作者
Chen, Shaoxiang [1 ]
Yao, Ting [3 ]
Jiang, Yu-Gang [1 ,2 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Intelligent Info Proc, Shanghai, Peoples R China
[2] Jilian Technol Grp Video, Shanghai, Peoples R China
[3] JD AI Res, Shanghai, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning has achieved great successes in solving specific artificial intelligence problems recently. Substantial progresses are made on Computer Vision (CV) and Natural Language Processing (NLP). As a connection between the two worlds of vision and language, video captioning is the task of producing a natural-language utterance (usually a sentence) that describes the visual content of a video. The task is naturally decomposed into two sub-tasks. One is to encode a video via a thorough understanding and learn visual representation. The other is caption generation, which decodes the learned representation into a sequential sentence, word by word. In this survey, we first formulate the problem of video captioning, then review state-of-the-art methods categorized by their emphasis on vision or language, and followed by a summary of standard datasets and representative approaches. Finally, we highlight the challenges which are not yet fully understood in this task and present future research directions.
引用
收藏
页码:6283 / 6290
页数:8
相关论文
共 50 条
  • [31] A detailed review of prevailing image captioning methods using deep learning techniques
    Deorukhkar, Kalpana
    Ket, Satish
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (01) : 1313 - 1336
  • [32] A detailed review of prevailing image captioning methods using deep learning techniques
    Kalpana Deorukhkar
    Satish Ket
    [J]. Multimedia Tools and Applications, 2022, 81 : 1313 - 1336
  • [33] A Review on Deep Learning Techniques for Video Prediction
    Oprea, Sergiu
    Martinez-Gonzalez, Pablo
    Garcia-Garcia, Alberto
    Castro-Vargas, John Alejandro
    Orts-Escolano, Sergio
    Garcia-Rodriguez, Jose
    Argyros, Antonis
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (06) : 2806 - 2826
  • [34] Deep learning for video object segmentation: a review
    Mingqi Gao
    Feng Zheng
    James J. Q. Yu
    Caifeng Shan
    Guiguang Ding
    Jungong Han
    [J]. Artificial Intelligence Review, 2023, 56 : 457 - 531
  • [35] Deep learning for video object segmentation: a review
    Gao, Mingqi
    Zheng, Feng
    Yu, James J. Q.
    Shan, Caifeng
    Ding, Guiguang
    Han, Jungong
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (01) : 457 - 531
  • [36] Video captioning: a review of theory, techniques and practices
    Jain, Vanita
    Al-Turjman, Fadi
    Chaudhary, Gopal
    Nayar, Devang
    Gupta, Varun
    Kumar, Aayush
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (25) : 35619 - 35653
  • [37] A Comprehensive Survey of Deep Learning for Image Captioning
    Hossain, Md Zakir
    Sohel, Ferdous
    Shiratuddin, Mohd Fairuz
    Laga, Hamid
    [J]. ACM COMPUTING SURVEYS, 2019, 51 (06)
  • [38] Facilitated Deep Learning Models for Image Captioning
    Azhar, Imtinan
    Afyouni, Imad
    Elnagar, Ashraf
    [J]. 2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,
  • [39] Video captioning based on vision transformer and reinforcement learning
    Zhao, Hong
    Chen, Zhiwen
    Guo, Lan
    Han, Zeyu
    [J]. PeerJ Computer Science, 2022, 8
  • [40] Video captioning based on vision transformer and reinforcement learning
    Zhao, Hong
    Chen, Zhiwen
    Guo, Lan
    Han, Zeyu
    [J]. PEERJ COMPUTER SCIENCE, 2022, 8