Context Gating with Short Temporal Information for Video Captioning

被引：0

作者：

Xu, Jinlei ^{[1
]}

Xu, Ting ^{[1
]}

Tian, Xin ^{[1
]}

Liu, Chunping ^{[1
]}

Ji, Yi ^{[1
]}

机构：

[1] Soachow Univ, Sch Comp Sci & Technol, Suzhou, Peoples R China

来源：

2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2019年

基金：

中国国家自然科学基金;

关键词：

video captioning; CNN; GRU; C3D; ML;

D O I：

10.1109/ijcnn.2019.8851897

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video Captioning is a newly emerging task which automatically translates content in a video into a textual description. Similar to image captioning, most existing methods simply utilized extracted visual features to generate sentences. However, in video captioning temporal information is much more important for description. Though the short temporal information (STI) is always ignored. Meanwhile, the context of generated sentence seems not been mined enough. In this paper, we build a context gating mechanism with STI based on encoder-decoder (CG-ED) neural framework for video captioning. In our approach, based on the 2D feature space, we cut and recombine the whole 3D features to extract STI by temporal distribution. To balance the contributions of different context of sentences, context gating is designed. Our proposed model is evaluated on two large-scale datasets: Microsoft Research-Video to Text (MSR-VTT) and Microsoft Research Video Description Corpus(MSVD). Experimental results demonstrate that its precision of caption is higher than most of the state-of-the-art approaches.

引用

页数：7

共 50 条

[1] Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning
Wang, Jingwen
Jiang, Wenhao
Ma, Lin
Liu, Wei
Xu, Yong
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7190 - 7198
[2] Exploiting the local temporal information for video captioning
Wei, Ran
Mi, Li
Hu, Yaosi
Chen, Zhenzhong
[J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2020, 67
[3] Long Short-Term Relation Transformer With Global Gating for Video Captioning
Li, Liang
Gao, Xingyu
Deng, Jincan
Tu, Yunbin
Zha, Zheng-Jun
Huang, Qingming
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 2726 - 2738
[4] Short Video Recommendation Algorithm Incorporating Temporal Contextual Information and User Context
Liu, Weihua
Wan, Haoyang
Yan, Boyuan
[J]. CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2023, 135 (01): : 239 - 258
[5] Context Visual Information-based Deliberation Network for Video Captioning
Lu, Min
Li, Xueyong
Liu, Caihua
[J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9812 - 9818
[6] Multi-scale features with temporal information guidance for video captioning
Zhao, Hong
Chen, Zhiwen
Yang, Yi
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
[7] Understanding temporal structure for video captioning
Sah, Shagan
Nguyen, Thang
Ptucha, Ray
[J]. PATTERN ANALYSIS AND APPLICATIONS, 2020, 23 (01) : 147 - 159
[8] Understanding temporal structure for video captioning
Shagan Sah
Thang Nguyen
Ray Ptucha
[J]. Pattern Analysis and Applications, 2020, 23 : 147 - 159
[9] Temporal Attention Feature Encoding for Video Captioning
Kim, Nayoung
Ha, Seong Jong
Kang, Je-Won
[J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 1279 - 1282
[10] Semantic similarity information discrimination for video captioning
Du, Sen
Zhu, Hong
Xiong, Ge
Lin, Guangfeng
Wang, Dong
Shi, Jing
Wang, Jing
Xing, Nan
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 213

← 1 2 3 4 5 →