BiTransformer: augmenting semantic context in video captioning via bidirectional decoder

被引:5
|
作者
Zhong, Maosheng [1 ]
Zhang, Hao [1 ]
Wang, Yong [1 ]
Xiong, Hao [1 ]
机构
[1] Jiangxi Normal Univ, 99 Ziyang Ave, Nanchang, Jiangxi, Peoples R China
关键词
Video captioning; Bidirectional decoding; Transformer;
D O I
10.1007/s00138-022-01329-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video captioning is an important problem involved in many applications. It aims to generate some descriptions of the content of a video. Most of existing methods for video captioning are based on the deep encoder-decoder models, particularly, the attention-based models (say Transformer). However, the existing transformer-based models may not fully exploit the semantic context, that is, only using the left-to-right style of context but ignoring the right-to-left counterpart. In this paper, we introduce a bidirectional (forward-backward) decoder to exploit both the left-to-right and right-to-left styles of context for the Transformer-based video captioning model. Thus, our model is called bidirectional Transformer (dubbed BiTransformer). Specifically, in the bridge of the encoder and forward decoder (aiming to capture the left-to-right context) used in the existing Transformer-based models, we plug in a backward decoder to capture the right-to-left context. Equipped with such bidirectional decoder, the semantic context of videos will be more fully exploited, resulting in better video captions. The effectiveness of our model is demonstrated over two benchmark datasets, i.e., MSVD and MSR-VTT,via comparing to the state-of-the-art methods. Particularly, in terms of the important evaluation metric CIDEr, the proposed model outperforms the state-of-the-art models with improvements of 1.2% in both datasets.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Dense video captioning using unsupervised semantic information
    Estevam, Valter
    Laroca, Rayson
    Pedrini, Helio
    Menotti, David
    Journal of Visual Communication and Image Representation, 2025, 107
  • [32] Richer Semantic Visual and Language Representation for Video Captioning
    Tang, Pengjie
    Wang, Hanli
    Wang, Hanzhang
    Xu, Kaisheng
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1871 - 1876
  • [33] Empirical autopsy of deep video captioning encoder-decoder architecture
    Aafaq, Nayyer
    Akhtar, Naveed
    Liu, Wei
    Mian, Ajmal
    ARRAY, 2021, 9
  • [34] Retrieval Augmented Convolutional Encoder-decoder Networks for Video Captioning
    Chen, Jingwen
    Pan, Yingwei
    Li, Yehao
    Yao, Ting
    Chao, Hongyang
    Mei, Tao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (01)
  • [35] Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning
    Chen, Jingwen
    Pan, Yingwei
    Li, Yehao
    Yao, Ting
    Chao, Hongyang
    Mei, Tao
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8167 - 8174
  • [36] Image captioning via semantic element embedding
    Zhang, Xiaodan
    He, Shengfeng
    Song, Xinhang
    Lau, Rynson W. H.
    Jiao, Jianbin
    Ye, Qixiang
    NEUROCOMPUTING, 2020, 395 : 212 - 221
  • [37] Abstractive Document Summarization via Bidirectional Decoder
    Wan, Xin
    Li, Chen
    Wang, Ruijia
    Xiao, Ding
    Shi, Chuan
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2018, 2018, 11323 : 364 - 377
  • [38] Context Gating with Short Temporal Information for Video Captioning
    Xu, Jinlei
    Xu, Ting
    Tian, Xin
    Liu, Chunping
    Ji, Yi
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [39] A Video Captioning Method by Semantic Topic-Guided Generation
    Ye, Ou
    Wei, Xinli
    Yu, Zhenhua
    Fu, Yan
    Yang, Ying
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 78 (01): : 1071 - 1093
  • [40] Video Captioning Based on Channel Soft Attention and Semantic Reconstructor
    Lei, Zhou
    Huang, Yiyong
    FUTURE INTERNET, 2021, 13 (02) : 1 - 18