Deep Learning for Video Captioning: A Review

被引:0
|
作者
Chen, Shaoxiang [1 ]
Yao, Ting [3 ]
Jiang, Yu-Gang [1 ,2 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Intelligent Info Proc, Shanghai, Peoples R China
[2] Jilian Technol Grp Video, Shanghai, Peoples R China
[3] JD AI Res, Shanghai, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning has achieved great successes in solving specific artificial intelligence problems recently. Substantial progresses are made on Computer Vision (CV) and Natural Language Processing (NLP). As a connection between the two worlds of vision and language, video captioning is the task of producing a natural-language utterance (usually a sentence) that describes the visual content of a video. The task is naturally decomposed into two sub-tasks. One is to encode a video via a thorough understanding and learn visual representation. The other is caption generation, which decodes the learned representation into a sequential sentence, word by word. In this survey, we first formulate the problem of video captioning, then review state-of-the-art methods categorized by their emphasis on vision or language, and followed by a summary of standard datasets and representative approaches. Finally, we highlight the challenges which are not yet fully understood in this task and present future research directions.
引用
收藏
页码:6283 / 6290
页数:8
相关论文
共 50 条
  • [1] Learning deep spatiotemporal features for video captioning
    Daskalakis, Eleftherios
    Tzelepi, Maria
    Tefas, Anastasios
    [J]. PATTERN RECOGNITION LETTERS, 2018, 116 : 143 - 149
  • [2] Combinatorial Analysis of Deep Learning and Machine Learning Video Captioning Studies: A Systematic Literature Review
    Kehkashan, Tanzila
    Alsaeedi, Abdullah
    Yafooz, Wael M. S.
    Ismail, Nor Azman
    Al-Dhaqm, Arafat
    [J]. IEEE ACCESS, 2024, 12 : 35048 - 35080
  • [3] Image and Video Captioning for Apparels Using Deep Learning
    Agarwal, Govind
    Jindal, Kritika
    Chowdhury, Abishi
    Singh, Vishal K.
    Pal, Amrit
    [J]. IEEE ACCESS, 2024, 12 : 113138 - 113150
  • [4] Deep learning based, a new model for video captioning
    Department of Computer Engineering, Faculty of Engineering Gazi University, Ankara, Turkey
    [J]. Intl. J. Adv. Comput. Sci. Appl., 2020, 3 (514-519):
  • [5] Deep Learning based, a New Model for Video Captioning
    Ozer, Elif Gusta
    Karapinar, Ilteber Nur
    Busbug, Sena
    Turan, Sumeyye
    Utku, Anil
    Akcayol, M. Ali
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (03) : 514 - 519
  • [6] Deep Learning Approaches on Image Captioning: A Review
    Ghandi, Taraneh
    Pourreza, Hamidreza
    Mahyar, Hamidreza
    [J]. ACM COMPUTING SURVEYS, 2024, 56 (03)
  • [7] Deep learning and knowledge graph for image/video captioning: A review of datasets, evaluation metrics, and methods
    Wajid, Mohammad Saif
    Terashima-Marin, Hugo
    Najafirad, Peyman
    Wajid, Mohd Anas
    [J]. ENGINEERING REPORTS, 2024, 6 (01)
  • [8] Towards Unified Deep Learning Model for NSFW Image and Video Captioning
    Ko, Jong-Won
    Hwang, Dong-Hyun
    [J]. ADVANCED MULTIMEDIA AND UBIQUITOUS ENGINEERING, MUE/FUTURETECH 2018, 2019, 518 : 57 - 63
  • [9] Exploring Video Captioning Techniques: A Comprehensive Survey on Deep Learning Methods
    Islam S.
    Dash A.
    Seum A.
    Raj A.H.
    Hossain T.
    Shah F.M.
    [J]. SN Computer Science, 2021, 2 (2)
  • [10] A Review Of Video Captioning Methods
    Mahajan, Dewarthi
    Bhosale, Sakshi
    Nighot, Yash
    Tayal, Madhuri
    [J]. INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2021, 12 (05): : 708 - 715