Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning

被引:110
|
作者
Wang, Jingwen [1 ,2 ]
Jiang, Wenhao [2 ]
Ma, Lin [2 ]
Liu, Wei [2 ]
Xu, Yong [1 ]
机构
[1] South China Univ Technol, Guangzhou, Guangdong, Peoples R China
[2] Tencent AI Lab, Bellevue, WA USA
关键词
D O I
10.1109/CVPR.2018.00751
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dense video captioning is a newly emerging task that aims at both localizing and describing all events in a video. We identify and tackle two challenges on this task, namely, (1) how to utilize both past and future contexts for accurate event proposal predictions, and (2) how to construct informative input to the decoder for generating natural event descriptions. First, previous works predominantly generate temporal event proposals in the forward direction, which neglects future video context. We propose a bidirectional proposal method that effectively exploits both past and future contexts to make proposal predictions. Second, different events ending at (nearly) the same time are indistinguishable in the previous works, resulting in the same captions. We solve this problem by representing each event with an attentive fusion of hidden states from the proposal module and video contents (e.g., C3D features). We further propose a novel context gating mechanism to balance the contributionsf rom the current event and its surrounding contexts dynamically. We empirically show that our attentively fused event representation is superior to the proposal hidden states or video contents alone. By coupling proposal and captioning modules into one unified framework, our model outperforms the state-of-the-arts on the ActivityNet Captions dataset with a relative gain of over 100% (Meteor score increases from 4.82 to 9.65).
引用
收藏
页码:7190 / 7198
页数:9
相关论文
共 50 条
  • [1] Context Gating with Short Temporal Information for Video Captioning
    Xu, Jinlei
    Xu, Ting
    Tian, Xin
    Liu, Chunping
    Ji, Yi
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [2] Dense Video Captioning With Early Linguistic Information Fusion
    Aafaq, Nayyer
    Mian, Ajmal
    Akhtar, Naveed
    Liu, Wei
    Shah, Mubarak
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2309 - 2322
  • [3] Position embedding fusion on transformer for dense video captioning
    Yang, Sixuan
    Tang, Pengjie
    Wang, Hanli
    Li, Qinyu
    DEVELOPMENTS OF ARTIFICIAL INTELLIGENCE TECHNOLOGIES IN COMPUTATION AND ROBOTICS, 2020, 12 : 792 - 799
  • [4] Cross-Domain Modality Fusion for Dense Video Captioning
    Aafaq N.
    Mian A.
    Liu W.
    Akhtar N.
    Shah M.
    IEEE Transactions on Artificial Intelligence, 2022, 3 (05): : 763 - 777
  • [5] BiTransformer: augmenting semantic context in video captioning via bidirectional decoder
    Maosheng Zhong
    Hao Zhang
    Yong Wang
    Hao Xiong
    Machine Vision and Applications, 2022, 33
  • [6] BiTransformer: augmenting semantic context in video captioning via bidirectional decoder
    Zhong, Maosheng
    Zhang, Hao
    Wang, Yong
    Xiong, Hao
    MACHINE VISION AND APPLICATIONS, 2022, 33 (05)
  • [7] Hierarchical Context-aware Network for Dense Video Event Captioning
    Ji, Lei
    Guo, Xianglin
    Huang, Haoyang
    Chen, Xilin
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2004 - 2013
  • [8] Survey of Dense Video Captioning
    Huang, Xiankai
    Zhang, Jiayu
    Wang, Xinyu
    Wang, Xiaochuan
    Liu, Ruijun
    Computer Engineering and Applications, 2023, 59 (12): : 28 - 48
  • [9] Streamlined Dense Video Captioning
    Mun, Jonghwan
    Yang, Linjie
    Ren, Zhou
    Xu, Ning
    Han, Bohyung
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3581 - +
  • [10] Attentive Visual Semantic Specialized Network for Video Captioning
    Perez-Martin, Jesus
    Bustos, Benjamin
    Perez, Jorge
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5767 - 5774