Context Gating with Short Temporal Information for Video Captioning

被引:0
|
作者
Xu, Jinlei [1 ]
Xu, Ting [1 ]
Tian, Xin [1 ]
Liu, Chunping [1 ]
Ji, Yi [1 ]
机构
[1] Soachow Univ, Sch Comp Sci & Technol, Suzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
video captioning; CNN; GRU; C3D; ML;
D O I
10.1109/ijcnn.2019.8851897
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video Captioning is a newly emerging task which automatically translates content in a video into a textual description. Similar to image captioning, most existing methods simply utilized extracted visual features to generate sentences. However, in video captioning temporal information is much more important for description. Though the short temporal information (STI) is always ignored. Meanwhile, the context of generated sentence seems not been mined enough. In this paper, we build a context gating mechanism with STI based on encoder-decoder (CG-ED) neural framework for video captioning. In our approach, based on the 2D feature space, we cut and recombine the whole 3D features to extract STI by temporal distribution. To balance the contributions of different context of sentences, context gating is designed. Our proposed model is evaluated on two large-scale datasets: Microsoft Research-Video to Text (MSR-VTT) and Microsoft Research Video Description Corpus(MSVD). Experimental results demonstrate that its precision of caption is higher than most of the state-of-the-art approaches.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning
    Wang, Jingwen
    Jiang, Wenhao
    Ma, Lin
    Liu, Wei
    Xu, Yong
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7190 - 7198
  • [2] Exploiting the local temporal information for video captioning
    Wei, Ran
    Mi, Li
    Hu, Yaosi
    Chen, Zhenzhong
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2020, 67
  • [3] Long Short-Term Relation Transformer With Global Gating for Video Captioning
    Li, Liang
    Gao, Xingyu
    Deng, Jincan
    Tu, Yunbin
    Zha, Zheng-Jun
    Huang, Qingming
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 2726 - 2738
  • [4] Short Video Recommendation Algorithm Incorporating Temporal Contextual Information and User Context
    Liu, Weihua
    Wan, Haoyang
    Yan, Boyuan
    [J]. CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2023, 135 (01): : 239 - 258
  • [5] Context Visual Information-based Deliberation Network for Video Captioning
    Lu, Min
    Li, Xueyong
    Liu, Caihua
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9812 - 9818
  • [6] Multi-scale features with temporal information guidance for video captioning
    Zhao, Hong
    Chen, Zhiwen
    Yang, Yi
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
  • [7] Understanding temporal structure for video captioning
    Sah, Shagan
    Nguyen, Thang
    Ptucha, Ray
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2020, 23 (01) : 147 - 159
  • [8] Understanding temporal structure for video captioning
    Shagan Sah
    Thang Nguyen
    Ray Ptucha
    [J]. Pattern Analysis and Applications, 2020, 23 : 147 - 159
  • [9] Temporal Attention Feature Encoding for Video Captioning
    Kim, Nayoung
    Ha, Seong Jong
    Kang, Je-Won
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 1279 - 1282
  • [10] Semantic similarity information discrimination for video captioning
    Du, Sen
    Zhu, Hong
    Xiong, Ge
    Lin, Guangfeng
    Wang, Dong
    Shi, Jing
    Wang, Jing
    Xing, Nan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 213