Sequential Memory Modelling for Video Captioning

被引:0
|
作者
Puttaraja [1 ]
Nayaka, Chidambara [1 ]
Manikesh [1 ]
Sharma, Nitin [1 ]
Anand, Kumar M. [1 ]
机构
[1] Natl Inst Technol Karnataka, Dept Informat Technol, Surathkal 575025, India
关键词
Deep learning; NLP; Natural Language Processing; LSTM; Encoder-Decoder Model;
D O I
10.1109/INDICON56171.2022.10039829
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In recent years, the automatic generation of natural language descriptions of video has focused on deep learning research and natural voice processing. Video understanding has multiple applications such as video search and indexing, but video subtitles are a correct sophisticated topic for complex and diverse types of video content. However, the understanding between video and natural language sets remains an open issue to better understand the video and create multiple methods to create a set automatically. The deep learning method has a major focus on the direction of video processing with performance and highspeed computing capabilities. This polling discusses an encoderdecoder network end-in-frame based on a deep learning approach to generate caption. In this paper we will describe the model, dataset and parameters used to evaluate the model.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Hierarchical Memory Modelling for Video Captioning
    Wang, Junbo
    Wang, Wei
    Huang, Yan
    Wang, Liang
    Tan, Tieniu
    [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 63 - 71
  • [2] M3: Multimodal Memory Modelling for Video Captioning
    Wang, Junbo
    Wang, Wei
    Huang, Yan
    Wang, Liang
    Tan, Tieniu
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7512 - 7520
  • [3] Memory-Based Augmentation Network for Video Captioning
    Jing, Shuaiqi
    Zhang, Haonan
    Zeng, Pengpeng
    Gao, Lianli
    Song, Jingkuan
    Shen, Heng Tao
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2367 - 2379
  • [4] Memory-Attended Recurrent Network for Video Captioning
    Pei, Wenjie
    Zhang, Jiyuan
    Wang, Xiangrong
    Ke, Lei
    Shen, Xiaoyong
    Tai, Yu-Wing
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8339 - 8348
  • [5] Automatic Video Captioning via Multi-channel Sequential Encoding
    Zhang, Chenyang
    Tian, Yingli
    [J]. COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 : 146 - 161
  • [6] A multi-layer memory sharing network for video captioning
    Niu, Tian-Zi
    Dong, Shan -Shan
    Chen, Zhen-Duo
    Luo, Xin
    Huang, Zi
    Guo, Shanqing
    Xu, Xin-Shun
    [J]. PATTERN RECOGNITION, 2023, 136
  • [7] Memory-enhanced hierarchical transformer for video paragraph captioning
    Zhang, Benhui
    Gao, Junyu
    Yuan, Yuan
    [J]. Neurocomputing, 2025, 615
  • [8] Enhanced-Memory Transformer for Coherent Paragraph Video Captioning
    Cardoso, Leonardo Vilela
    Guimaraes, Silvio Jamil F.
    Patrocinio Jr, Zenilton K. G.
    [J]. 2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 836 - 840
  • [9] Multimodal architecture for video captioning with memory networks and an attention mechanism
    Li, Wei
    Guo, Dashan
    Fang, Xiangzhong
    [J]. PATTERN RECOGNITION LETTERS, 2018, 105 : 23 - 29
  • [10] Image/video captioning
    画像/ビデオのキャプション
    [J]. Ushiku, Yoshitaka, 2018, Inst. of Image Information and Television Engineers (72):