A Deep Structured Model for Video Captioning

被引:5
|
作者
Vinodhini, V. [1 ]
Sathiyabhama, B. [2 ]
Sankar, S. [1 ]
Somula, Ramasubbareddy [3 ]
机构
[1] Sona Coll Technol, Salem, India
[2] Sona Coll Technol, Dept CSE, Salem, India
[3] Vallurupalli Nageswara Rao Vignana Jyothi Inst En, Hyderabad, India
关键词
Convolutional Neural Network; Hidden Markov Model; Long Short-Term Memory Networks; Video Caption;
D O I
10.4018/IJGCMS.2020040103
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Video captions help people to understand in a noisy environment or when the sound is muted. It helps people having impaired hearing to understand much better. Captions not only support the content creators and translators but also boost the search engine optimization. Many advanced areas like computer vision and human-computer interaction play a vital role as there is a successful growth of deep learning techniques. Numerous surveys on deep learning models are evolved with different methods, architecture, and metrics. Working with video subtitles is still challenging in terms of activity recognition in video. This paper proposes a deep structured model that is effective towards activity recognition, automatically classifies and caption it in a single architecture. The first process includes subtracting the foreground from the background; this is done by building a 3D convolutional neural network (CNN) model. A Gaussian mixture model is used to remove the backdrop. The classification is done using long short-term memory networks (LSTM). A hidden Markov model (HMM) is used to generate the high quality data. Next, it uses the nonlinear activation function to perform the normalization process. Finally, the video captioning is achieved by using natural language.
引用
收藏
页码:44 / 56
页数:13
相关论文
共 50 条
  • [1] Deep Learning based, a New Model for Video Captioning
    Ozer, Elif Gusta
    Karapinar, Ilteber Nur
    Busbug, Sena
    Turan, Sumeyye
    Utku, Anil
    Akcayol, M. Ali
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (03) : 514 - 519
  • [2] Deep multimodal embedding for video captioning
    Jin Young Lee
    [J]. Multimedia Tools and Applications, 2019, 78 : 31793 - 31805
  • [3] Deep multimodal embedding for video captioning
    Lee, Jin Young
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (22) : 31793 - 31805
  • [4] Deep Learning for Video Captioning: A Review
    Chen, Shaoxiang
    Yao, Ting
    Jiang, Yu-Gang
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 6283 - 6290
  • [5] Towards Unified Deep Learning Model for NSFW Image and Video Captioning
    Ko, Jong-Won
    Hwang, Dong-Hyun
    [J]. ADVANCED MULTIMEDIA AND UBIQUITOUS ENGINEERING, MUE/FUTURETECH 2018, 2019, 518 : 57 - 63
  • [6] Structured Encoding Based on Semantic Disambiguation for Video Captioning
    Sun, Bo
    Tian, Jinyu
    Wu, Yong
    Yu, Lunjun
    Tang, Yuanyan
    [J]. COGNITIVE COMPUTATION, 2024, 16 (03) : 1032 - 1048
  • [7] Interpretable Video Captioning via Trajectory Structured Localization
    Wu, Xian
    Li, Guanbin
    Cao, Qingxing
    Ji, Qingge
    Lin, Liang
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6829 - 6837
  • [8] Deep Reinforcement Polishing Network for Video Captioning
    Xu, Wanru
    Yu, Jian
    Miao, Zhenjiang
    Wan, Lili
    Tian, Yi
    Ji, Qiang
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 1772 - 1784
  • [9] Learning deep spatiotemporal features for video captioning
    Daskalakis, Eleftherios
    Tzelepi, Maria
    Tefas, Anastasios
    [J]. PATTERN RECOGNITION LETTERS, 2018, 116 : 143 - 149
  • [10] DVC-Net: A deep neural network model for dense video captioning
    Lee, Sujin
    Kim, Incheol
    [J]. IET COMPUTER VISION, 2021, 15 (01) : 12 - 23