Video description: A comprehensive survey of deep learning approaches

被引:10
|
作者
Rafiq, Ghazala [1 ]
Rafiq, Muhammad [2 ]
Choi, Gyu Sang [1 ]
机构
[1] Yeungnam Univ, Dept Informat & Commun Engn, Gyongsan 38541, South Korea
[2] Keimyung Univ, Dept Game & Mobile Engn, 1095 Dalgubeol Daero, Daegu 42601, South Korea
基金
新加坡国家研究基金会;
关键词
Deep learning; Encoder-Decoder architecture; Text description; Video captioning techniques; Video description approaches; Video captioning; Vision to text; NETWORKS;
D O I
10.1007/s10462-023-10414-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video description refers to understanding visual content and transforming that acquired understanding into automatic textual narration. It bridges the key AI fields of computer vision and natural language processing in conjunction with real-time and practical applications. Deep learning-based approaches employed for video description have demonstrated enhanced results compared to conventional approaches. The current literature lacks a thorough interpretation of the recently developed and employed sequence to sequence techniques for video description. This paper fills that gap by focusing mainly on deep learning-enabled approaches to automatic caption generation. Sequence to sequence models follow an Encoder-Decoder architecture employing a specific composition of CNN, RNN, or the variants LSTM or GRU as an encoder and decoder block. This standard-architecture can be fused with an attention mechanism to focus on a specific distinctiveness, achieving high quality results. Reinforcement learning employed within the Encoder-Decoder structure can progressively deliver state-of-the-art captions by following exploration and exploitation strategies. The transformer mechanism is a modern and efficient transductive architecture for robust output. Free from recurrence, and solely based on self-attention, it allows parallelization along with training on a massive amount of data. It can fully utilize the available GPUs for most NLP tasks. Recently, with the emergence of several versions of transformers, long term dependency handling is not an issue anymore for researchers engaged in video processing for summarization and description, or for autonomous-vehicle, surveillance, and instructional purposes. They can get auspicious directions from this research.
引用
收藏
页码:13293 / 13372
页数:80
相关论文
共 50 条
  • [21] A Comprehensive Survey on Geometric Deep Learning
    Cao, Wenming
    Yan, Zhiyue
    He, Zhiquan
    He, Zhihai
    IEEE ACCESS, 2020, 8 : 35929 - 35949
  • [22] The Deep Learning Compiler: A Comprehensive Survey
    Li, Mingzhen
    Liu, Yi
    Liu, Xiaoyan
    Sun, Qingxiao
    You, Xin
    Yang, Hailong
    Luan, Zhongzhi
    Gan, Lin
    Yang, Guangwen
    Qian, Depei
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (03) : 708 - 727
  • [23] Parallel approaches to machine learning - A comprehensive survey
    Upadhyaya, Sujatha R.
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (03) : 284 - 292
  • [24] A Survey on Video Dehazing Using Deep Learning
    Feng, Yue
    2020 4TH INTERNATIONAL CONFERENCE ON CONTROL ENGINEERING AND ARTIFICIAL INTELLIGENCE (CCEAI 2020), 2020, 1487
  • [25] A Survey on Deep Learning Technique for Video Segmentation
    Zhou, Tianfei
    Porikli, Fatih
    Crandall, David J.
    Van Gool, Luc
    Wang, Wenguan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7099 - 7122
  • [26] A Comprehensive Survey of Deep Learning for Image Captioning
    Hossain, Md Zakir
    Sohel, Ferdous
    Shiratuddin, Mohd Fairuz
    Laga, Hamid
    ACM COMPUTING SURVEYS, 2019, 51 (06)
  • [27] A comprehensive survey on radio frequency (RF) fingerprinting: Traditional approaches, deep learning, and open challenges
    Jagannath, Anu
    Jagannath, Jithin
    Kumar, Prem Sagar Pattanshetty Vasanth
    COMPUTER NETWORKS, 2022, 219
  • [28] Deep Learning for Visual Tracking: A Comprehensive Survey
    Marvasti-Zadeh, Seyed Mojtaba
    Cheng, Li
    Ghanei-Yakhdan, Hossein
    Kasaei, Shohreh
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (05) : 3943 - 3968
  • [29] A Comprehensive Survey on Deep Graph Representation Learning
    Ju, Wei
    Fang, Zheng
    Gu, Yiyang
    Liu, Zequn
    Long, Qingqing
    Qiao, Ziyue
    Qin, Yifang
    Shen, Jianhao
    Sun, Fang
    Xiao, Zhiping
    Yang, Junwei
    Yuan, Jingyang
    Zhao, Yusheng
    Wang, Yifan
    Luo, Xiao
    Zhang, Ming
    NEURAL NETWORKS, 2024, 173
  • [30] Financial Cybercrime: A Comprehensive Survey of Deep Learning Approaches to Tackle the Evolving Financial Crime Landscape
    Nicholls, Jack
    Kuppa, Aditya
    Le-Khac, Nhien-An
    IEEE ACCESS, 2021, 9 : 163965 - 163986