Video description: A comprehensive survey of deep learning approaches

被引:10
|
作者
Rafiq, Ghazala [1 ]
Rafiq, Muhammad [2 ]
Choi, Gyu Sang [1 ]
机构
[1] Yeungnam Univ, Dept Informat & Commun Engn, Gyongsan 38541, South Korea
[2] Keimyung Univ, Dept Game & Mobile Engn, 1095 Dalgubeol Daero, Daegu 42601, South Korea
基金
新加坡国家研究基金会;
关键词
Deep learning; Encoder-Decoder architecture; Text description; Video captioning techniques; Video description approaches; Video captioning; Vision to text; NETWORKS;
D O I
10.1007/s10462-023-10414-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video description refers to understanding visual content and transforming that acquired understanding into automatic textual narration. It bridges the key AI fields of computer vision and natural language processing in conjunction with real-time and practical applications. Deep learning-based approaches employed for video description have demonstrated enhanced results compared to conventional approaches. The current literature lacks a thorough interpretation of the recently developed and employed sequence to sequence techniques for video description. This paper fills that gap by focusing mainly on deep learning-enabled approaches to automatic caption generation. Sequence to sequence models follow an Encoder-Decoder architecture employing a specific composition of CNN, RNN, or the variants LSTM or GRU as an encoder and decoder block. This standard-architecture can be fused with an attention mechanism to focus on a specific distinctiveness, achieving high quality results. Reinforcement learning employed within the Encoder-Decoder structure can progressively deliver state-of-the-art captions by following exploration and exploitation strategies. The transformer mechanism is a modern and efficient transductive architecture for robust output. Free from recurrence, and solely based on self-attention, it allows parallelization along with training on a massive amount of data. It can fully utilize the available GPUs for most NLP tasks. Recently, with the emergence of several versions of transformers, long term dependency handling is not an issue anymore for researchers engaged in video processing for summarization and description, or for autonomous-vehicle, surveillance, and instructional purposes. They can get auspicious directions from this research.
引用
收藏
页码:13293 / 13372
页数:80
相关论文
共 50 条
  • [31] A Comprehensive Survey on Community Detection With Deep Learning
    Su, Xing
    Xue, Shan
    Liu, Fanzhen
    Wu, Jia
    Yang, Jian
    Zhou, Chuan
    Hu, Wenbin
    Paris, Cecile
    Nepal, Surya
    Jin, Di
    Sheng, Quan Z.
    Yu, Philip S.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 4682 - 4702
  • [32] A survey of deep learning approaches to image restoration
    Su, Jingwen
    Xu, Boyan
    Yin, Hujun
    NEUROCOMPUTING, 2022, 487 : 46 - 65
  • [33] Deep learning approaches to lexical simplification: A survey
    North, Kai
    Ranasinghe, Tharindu
    Shardlow, Matthew
    Zampieri, Marcos
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2024, : 111 - 134
  • [34] Deep Learning Approaches for Similarity Computation: A Survey
    Yang, Peilun
    Wang, Hanchen
    Yang, Jianye
    Qian, Zhengping
    Zhang, Ying
    Lin, Xuemin
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (12) : 7893 - 7912
  • [35] Deep learning approaches for seizure video analysis: A review
    Ahmedt-Aristizabal, David
    Armin, Mohammad Ali
    Hayder, Zeeshan
    Garcia-Cairasco, Norberto
    Petersson, Lars
    Fookes, Clinton
    Denman, Simon
    Mcgonigal, Aileen
    EPILEPSY & BEHAVIOR, 2024, 154
  • [36] Deep Learning Approaches for Video Compression: A Bibliometric Analysis
    Bidwe, Ranjeet Vasant
    Mishra, Sashikala
    Patil, Shruti
    Shaw, Kailash
    Vora, Deepali Rahul
    Kotecha, Ketan
    Zope, Bhushan
    BIG DATA AND COGNITIVE COMPUTING, 2022, 6 (02)
  • [37] A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning
    Wang, Zhenyi
    Yang, Enneng
    Shen, Li
    Huang, Heng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (03) : 1464 - 1483
  • [38] Deep Neural Approaches to Relation Triplets Extraction: a Comprehensive Survey
    Tapas Nayak
    Navonil Majumder
    Pawan Goyal
    Soujanya Poria
    Cognitive Computation, 2021, 13 : 1215 - 1232
  • [39] Deep Neural Approaches to Relation Triplets Extraction: a Comprehensive Survey
    Nayak, Tapas
    Majumder, Navonil
    Goyal, Pawan
    Poria, Soujanya
    COGNITIVE COMPUTATION, 2021, 13 (05) : 1215 - 1232
  • [40] A Brief Survey of Deep Learning Approaches for Learning Analytics on MOOCs
    Sun, Zhongtian
    Harit, Anoushka
    Yu, Jialin
    Cristea, Alexandra, I
    Shi, Lei
    INTELLIGENT TUTORING SYSTEMS (ITS 2021), 2021, 12677 : 28 - 37