BiTransformer: augmenting semantic context in video captioning via bidirectional decoder

被引:5
|
作者
Zhong, Maosheng [1 ]
Zhang, Hao [1 ]
Wang, Yong [1 ]
Xiong, Hao [1 ]
机构
[1] Jiangxi Normal Univ, 99 Ziyang Ave, Nanchang, Jiangxi, Peoples R China
关键词
Video captioning; Bidirectional decoding; Transformer;
D O I
10.1007/s00138-022-01329-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video captioning is an important problem involved in many applications. It aims to generate some descriptions of the content of a video. Most of existing methods for video captioning are based on the deep encoder-decoder models, particularly, the attention-based models (say Transformer). However, the existing transformer-based models may not fully exploit the semantic context, that is, only using the left-to-right style of context but ignoring the right-to-left counterpart. In this paper, we introduce a bidirectional (forward-backward) decoder to exploit both the left-to-right and right-to-left styles of context for the Transformer-based video captioning model. Thus, our model is called bidirectional Transformer (dubbed BiTransformer). Specifically, in the bridge of the encoder and forward decoder (aiming to capture the left-to-right context) used in the existing Transformer-based models, we plug in a backward decoder to capture the right-to-left context. Equipped with such bidirectional decoder, the semantic context of videos will be more fully exploited, resulting in better video captions. The effectiveness of our model is demonstrated over two benchmark datasets, i.e., MSVD and MSR-VTT,via comparing to the state-of-the-art methods. Particularly, in terms of the important evaluation metric CIDEr, the proposed model outperforms the state-of-the-art models with improvements of 1.2% in both datasets.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Video Captioning With Attention-Based LSTM and Semantic Consistency
    Gao, Lianli
    Guo, Zhao
    Zhang, Hanwang
    Xu, Xing
    Shen, Heng Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (09) : 2045 - 2055
  • [42] Video Captioning via Hierarchical Reinforcement Learning
    Wang, Xin
    Chen, Wenhu
    Wu, Jiawei
    Wang, Yuan-Fang
    Wang, William Yang
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4213 - 4222
  • [43] Semantic Enhanced Video Captioning with Multi-feature Fusion
    Niu, Tian-Zi
    Dong, Shan-Shan
    Chen, Zhen-Duo
    Luo, Xin
    Guo, Shanqing
    Huang, Zi
    Xu, Xin-Shun
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (06)
  • [44] Learning topic emotion and logical semantic for video paragraph captioning
    Li, Qinyu
    Wang, Hanli
    Yi, Xiaokai
    DISPLAYS, 2024, 83
  • [45] Set Prediction Guided by Semantic Concepts for Diverse Video Captioning
    Lu, Yifan
    Zhang, Ziqi
    Yuan, Chunfeng
    Li, Peng
    Wang, Yan
    Li, Bing
    Hu, Weiming
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3909 - 3917
  • [46] Video captioning algorithm based on mixed training and semantic association
    Chen, Shuqin
    Zhong, Xian
    Huang, Wenxin
    Lu, Yansheng
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2023, 51 (11): : 67 - 74
  • [47] Fused GRU with semantic-temporal attention for video captioning
    Gao, Lianli
    Wang, Xuanhan
    Song, Jingkuan
    Liu, Yang
    NEUROCOMPUTING, 2020, 395 : 222 - 228
  • [48] Encoder-Decoder Model for Automatic Video Captioning Using Yolo Algorithm
    Alkalouti, Hanan Nasser
    Al Masre, Mayada Ahmed
    2021 IEEE INTERNATIONAL IOT, ELECTRONICS AND MECHATRONICS CONFERENCE (IEMTRONICS), 2021, : 718 - 721
  • [49] Object-aware Aggregation with Bidirectional Temporal Graph for Video Captioning
    Zhang, Junchao
    Peng, Yuxin
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8319 - 8328
  • [50] Traffic Scenario Understanding and Video Captioning via Guidance Attention Captioning Network
    Liu, Chunsheng
    Zhang, Xiao
    Chang, Faliang
    Li, Shuang
    Hao, Penghui
    Lu, Yansha
    Wang, Yinhai
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (05) : 3615 - 3627