BiTransformer: augmenting semantic context in video captioning via bidirectional decoder

被引:5
|
作者
Zhong, Maosheng [1 ]
Zhang, Hao [1 ]
Wang, Yong [1 ]
Xiong, Hao [1 ]
机构
[1] Jiangxi Normal Univ, 99 Ziyang Ave, Nanchang, Jiangxi, Peoples R China
关键词
Video captioning; Bidirectional decoding; Transformer;
D O I
10.1007/s00138-022-01329-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video captioning is an important problem involved in many applications. It aims to generate some descriptions of the content of a video. Most of existing methods for video captioning are based on the deep encoder-decoder models, particularly, the attention-based models (say Transformer). However, the existing transformer-based models may not fully exploit the semantic context, that is, only using the left-to-right style of context but ignoring the right-to-left counterpart. In this paper, we introduce a bidirectional (forward-backward) decoder to exploit both the left-to-right and right-to-left styles of context for the Transformer-based video captioning model. Thus, our model is called bidirectional Transformer (dubbed BiTransformer). Specifically, in the bridge of the encoder and forward decoder (aiming to capture the left-to-right context) used in the existing Transformer-based models, we plug in a backward decoder to capture the right-to-left context. Equipped with such bidirectional decoder, the semantic context of videos will be more fully exploited, resulting in better video captions. The effectiveness of our model is demonstrated over two benchmark datasets, i.e., MSVD and MSR-VTT,via comparing to the state-of-the-art methods. Particularly, in terms of the important evaluation metric CIDEr, the proposed model outperforms the state-of-the-art models with improvements of 1.2% in both datasets.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] SEMANTIC LEARNING NETWORK FOR CONTROLLABLE VIDEO CAPTIONING
    Chen, Kaixuan
    Di, Qianji
    Lu, Yang
    Wang, Hanzi
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 880 - 884
  • [22] Discriminative Latent Semantic Graph for Video Captioning
    Bai, Yang
    Wang, Junyan
    Long, Yang
    Hu, Bingzhang
    Song, Yang
    Pagnucco, Maurice
    Guan, Yu
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3556 - 3564
  • [23] Boundary Detector Encoder and Decoder with Soft Attention for Video Captioning
    Chen, Tangming
    Zhao, Qike
    Song, Jingkuan
    WEB AND BIG DATA, APWEB-WAIM 2019, 2019, 11809 : 105 - 115
  • [24] A Context Semantic Auxiliary Network for Image Captioning
    Li, Jianying
    Shao, Xiangjun
    INFORMATION, 2023, 14 (07)
  • [25] Bidirectional difference locating and semantic consistency reasoning for change captioning
    Sun, Yaoqi
    Li, Liang
    Yao, Tingting
    Lu, Tongyv
    Zheng, Bolun
    Yan, Chenggang
    Zhang, Hua
    Bao, Yongjun
    Ding, Guiguang
    Slabaugh, Gregory
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (05) : 2969 - 2987
  • [26] Attentive Visual Semantic Specialized Network for Video Captioning
    Perez-Martin, Jesus
    Bustos, Benjamin
    Perez, Jorge
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5767 - 5774
  • [27] Video Captioning with Semantic Information from the Knowledge Base
    Wang, Dan
    Song, Dandan
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (IEEE ICBK 2017), 2017, : 224 - 229
  • [28] Structured Encoding Based on Semantic Disambiguation for Video Captioning
    Sun, Bo
    Tian, Jinyu
    Wu, Yong
    Yu, Lunjun
    Tang, Yuanyan
    COGNITIVE COMPUTATION, 2024, 16 (03) : 1032 - 1048
  • [29] Semantic Tag Augmented XlanV Model for Video Captioning
    Huang, Yiqing
    Xue, Hongwei
    Chen, Jiansheng
    Ma, Huimin
    Ma, Hongbing
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4818 - 4822
  • [30] Video captioning with stacked attention and semantic hard pull
    Rahman, Md Mushfiqur
    Abedin, Thasin
    Prottoy, Khondokar S. S.
    Moshruba, Ayana
    Siddiqui, Fazlul Hasan
    PEERJ COMPUTER SCIENCE, 2021, 7 : 1 - 18