Video captioning: a review of theory, techniques and practices

被引:9
|
作者
Jain, Vanita [1 ]
Al-Turjman, Fadi [2 ]
Chaudhary, Gopal [1 ]
Nayar, Devang [1 ]
Gupta, Varun [1 ]
Kumar, Aayush [1 ]
机构
[1] Bharati Vidyapeeths Coll Engn, New Delhi, India
[2] Near East Univ, Nicosia, Cyprus
关键词
Video captioning; Natural language processing; CNN; RNN; Encoder-decoder framework; ATTENTION; LANGUAGE; VISION;
D O I
10.1007/s11042-021-11878-w
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In today's world, video captioning is extensively used in various applications for specially-abled and, more specifically, visually abled persons. With advancements in technology for object detection and natural processing, there has been an instant surge infusing the above mainstream tasks. One such example of this fusion resulted in the generation of Image captions when an input image is fed to the system, and it gives a short description of what is present in the image. This fusion pertained to images and was further moved to be implemented on the Videos, with some tweaking in the current methods. This paper presents the survey of the state of art techniques of various video captioning methods. There have been many inputs provided by people worldwide in this domain; thus, there was a need to compile, study and analyze all the results and present that in a comprehensive study, which we have done in this paper. The comparison of various video captioning methods on the distinct dataset was evaluated on different parameters, which were most common and mainly used for image and video analysis. This review was done for methods used from the year 2015-2019 (year by year). The most commonly used dataset and evaluation method are also pictorially represented in a bar graph and scatter plot for each year for the respective evaluation parameter. Though a lot of analysis and research has been done on video captioning, our survey shows many problems.
引用
收藏
页码:35619 / 35653
页数:35
相关论文
共 50 条
  • [1] Retraction Note: Video captioning: a review of theory, techniques and practices
    Vanita Jain
    Fadi Al-Turjman
    Gopal Chaudhary
    Devang Nayar
    Varun Gupta
    Aayush Kumar
    [J]. Multimedia Tools and Applications, 2024, 83 (22) : 62493 - 62493
  • [2] A Review Of Video Captioning Methods
    Mahajan, Dewarthi
    Bhosale, Sakshi
    Nighot, Yash
    Tayal, Madhuri
    [J]. INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2021, 12 (05): : 708 - 715
  • [3] Deep Learning for Video Captioning: A Review
    Chen, Shaoxiang
    Yao, Ting
    Jiang, Yu-Gang
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 6283 - 6290
  • [4] QAVidCap: Enhancing Video Captioning through Question Answering Techniques
    Liu, Hui
    Wan, Xiaojun
    [J]. PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 155 - 164
  • [5] Review on Image Captioning and Speech Synthesis Techniques
    Sruthi, K., V
    Meharban, M. S.
    [J]. 2020 6TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2020, : 352 - 356
  • [6] Exploring Video Captioning Techniques: A Comprehensive Survey on Deep Learning Methods
    Islam S.
    Dash A.
    Seum A.
    Raj A.H.
    Hossain T.
    Shah F.M.
    [J]. SN Computer Science, 2021, 2 (2)
  • [7] Image/video captioning
    画像/ビデオのキャプション
    [J]. Ushiku, Yoshitaka, 2018, Inst. of Image Information and Television Engineers (72):
  • [8] MM21 Pre-training for Video Understanding Challenge: Video Captioning with Pretraining Techniques
    Chen, Sihan
    Zhu, Xinxin
    Hao, Dongze
    Liu, Wei
    Liu, Jiawei
    Zhao, Zijia
    Guo, Longteng
    Liu, Jing
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4853 - 4857
  • [9] VIDEO CAPTIONING BASED ON JOINT IMAGE-AUDIO DEEP LEARNING TECHNIQUES
    Wang, Chien-Yao
    Liaw, Pei-Sin
    Liang, Kai-Wen
    Wang, Jai-Ching
    Chang, Pao-Chi
    [J]. 2019 IEEE 9TH INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE-BERLIN), 2019, : 127 - 131
  • [10] Video Captioning based on Image Captioning as Subsidiary Content
    Vaishnavi, J.
    Narmatha, V
    [J]. 2022 SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL, COMPUTING, COMMUNICATION AND SUSTAINABLE TECHNOLOGIES (ICAECT), 2022,