MULTISTREAM HIERARCHICAL BOUNDARY NETWORK FOR VIDEO CAPTIONING

被引:0
|
作者
Thang Nguyen [1 ]
Sah, Shagan [1 ]
Ptucha, Raymond [1 ]
机构
[1] Rochester Inst Technol, Rochester, NY 14623 USA
关键词
video captioning; video boundary; hierarchical models; attention;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Video understanding has become increasingly important as surveillance, social, and informational videos weave themselves into our everyday lives. Video captioning offers a way to summarize, index, and search the data. Most captioning models utilize a video encoder and caption decoder framework. Hierarchical encoders can abstractly capture clip level temporal features to represent a video, but the clips are at fixed time steps. This paper introduces a novel Multistream Hierarchical Boundary (MHB) model which combines a fixed hierarchy recurrent architecture with a soft hierarchy layer by using intrinsic feature boundary cuts within a video to define clips. A novel parametric Gaussian attention allows handling of variable length videos. The intrinsic properties of videos are utilized to form an adaptive hierarchical video representation. This model is trained in an end-to-end fashion for video captioning. The MHB model demonstrates state-of-the-art video captioning results on recent datasets.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Hierarchical Modular Network for Video Captioning
    Ye, Hanhua
    Li, Guorong
    Qi, Yuankai
    Wang, Shuhui
    Huang, Qingming
    Yang, Ming-Hsuan
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17918 - 17927
  • [2] Hierarchical Boundary-Aware Neural Encoder for Video Captioning
    Baraldi, Lorenzo
    Grana, Costantino
    Cucchiara, Rita
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3185 - 3194
  • [3] Multimodal-enhanced hierarchical attention network for video captioning
    Zhong, Maosheng
    Chen, Youde
    Zhang, Hao
    Xiong, Hao
    Wang, Zhixiang
    [J]. MULTIMEDIA SYSTEMS, 2023, 29 (05) : 2469 - 2482
  • [4] Multimodal-enhanced hierarchical attention network for video captioning
    Maosheng Zhong
    Youde Chen
    Hao Zhang
    Hao Xiong
    Zhixiang Wang
    [J]. Multimedia Systems, 2023, 29 : 2469 - 2482
  • [5] Hierarchical Representation Network With Auxiliary Tasks for Video Captioning and Video Question Answering
    Gao, Lianli
    Lei, Yu
    Zeng, Pengpeng
    Song, Jingkuan
    Wang, Meng
    Shen, Heng Tao
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 202 - 215
  • [6] Syntax-Guided Hierarchical Attention Network for Video Captioning
    Deng, Jincan
    Li, Liang
    Zhang, Beichen
    Wang, Shuhui
    Zha, Zhengjun
    Huang, Qingming
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (02) : 880 - 892
  • [7] Video captioning with boundary-aware hierarchical language decoding and joint video prediction
    Shi, Xiangxi
    Cai, Jianfei
    Gu, Jiuxiang
    Joty, Shafiq
    [J]. NEUROCOMPUTING, 2020, 417 : 347 - 356
  • [8] Hierarchical Context-aware Network for Dense Video Event Captioning
    Ji, Lei
    Guo, Xianglin
    Huang, Haoyang
    Chen, Xilin
    [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2004 - 2013
  • [9] Hierarchical Memory Modelling for Video Captioning
    Wang, Junbo
    Wang, Wei
    Huang, Yan
    Wang, Liang
    Tan, Tieniu
    [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 63 - 71
  • [10] Reconstruction Network for Video Captioning
    Wang, Bairui
    Ma, Lin
    Zhang, Wei
    Liu, Wei
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7622 - 7631