Video Captioning of Future Frames

被引:1
|
作者
Hosseinzadeh, Mehrdad [1 ]
Wang, Yang [1 ,2 ]
机构
[1] Univ Manitoba, Winnipeg, MB, Canada
[2] Huawei Technol Canada, Markham, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
10.1109/WACV48630.2021.00102
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Being able to anticipate and describe what may happen in the future is a fundamental ability for humans. Given a short clip of a scene about "a person is sitting behind a piano", humans can describe what will happen afterward, i.e. "the person is playing the piano". In this paper, we consider the task of captioning future events to assess the performance of intelligent models on anticipation and video description generation tasks simultaneously. More specifically, given only the frames relating to an occurring event (activity), the goal is to generate a sentence describing the most likely next event in the video. We tackle the problem by first predicting the next event in the semantic space of convolutional features, then fusing contextual information into those features, and feeding them to a captioning module. Departing from using recurrent units allows us to train the network in parallel. We compare the proposed method with a baseline and an oracle method on the ActivityNet-Captions dataset. Experimental results demonstrate that the proposed method outperforms the baseline and is comparable to the oracle method. We perform additional ablation study to further analyze our approach.
引用
收藏
页码:979 / 988
页数:10
相关论文
共 50 条
  • [1] Less Is More: Picking Informative Frames for Video Captioning
    Chen, Yangyu
    Wang, Shuhui
    Zhang, Weigang
    Huang, Qingming
    [J]. COMPUTER VISION - ECCV 2018, PT XIII, 2018, 11217 : 367 - 384
  • [2] Enhancing the alignment between target words and corresponding frames for video captioning
    Tu, Yunbin
    Zhou, Chang
    Guo, Junjun
    Gao, Shengxiang
    Yu, Zhengtao
    [J]. PATTERN RECOGNITION, 2021, 111 (111)
  • [3] Image/video captioning
    Ushiku Y.
    [J]. Ushiku, Yoshitaka, 2018, Inst. of Image Information and Television Engineers (72): : 650 - 654
  • [4] Video-Captioning Evaluation Metric for Segments (VEMS): A Metric for Segment-level Evaluation of Video Captions with Weighted Frames
    M. Ravinder
    Vaidehi Gupta
    Kanishka Arora
    Arti Ranjan
    Yu-Chen Hu
    [J]. Multimedia Tools and Applications, 2024, 83 : 47699 - 47733
  • [5] Video Captioning based on Image Captioning as Subsidiary Content
    Vaishnavi, J.
    Narmatha, V
    [J]. 2022 SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL, COMPUTING, COMMUNICATION AND SUSTAINABLE TECHNOLOGIES (ICAECT), 2022,
  • [6] Video-Captioning Evaluation Metric for Segments (VEMS): A Metric for Segment-level Evaluation of Video Captions with Weighted Frames
    Ravinder, M.
    Gupta, Vaidehi
    Arora, Kanishka
    Ranjan, Arti
    Hu, Yu-Chen
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (16) : 47699 - 47733
  • [7] Multi-sentence video captioning using spatial saliency of video frames and content-oriented beam search algorithm
    Nabati, Masoomeh
    Behrad, Alireza
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 228
  • [8] Rethink video retrieval representation for video captioning
    Tian, Mingkai
    Li, Guorong
    Qi, Yuankai
    Wang, Shuhui
    Sheng, Quan Z.
    Huang, Qingming
    [J]. Pattern Recognition, 2024, 156
  • [9] A Review Of Video Captioning Methods
    Mahajan, Dewarthi
    Bhosale, Sakshi
    Nighot, Yash
    Tayal, Madhuri
    [J]. INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2021, 12 (05): : 708 - 715
  • [10] Streamlined Dense Video Captioning
    Mun, Jonghwan
    Yang, Linjie
    Ren, Zhou
    Xu, Ning
    Han, Bohyung
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3581 - +