Hierarchical Language Modeling for Dense Video Captioning

被引:0
|
作者
Dave, Jaivik [1 ]
Padmavathi, S. [1 ]
机构
[1] Amrita Vishwa Vidyapeetham, Dept Comp Sci & Engn, Amrita Sch Engn, Coimbatore, Tamil Nadu, India
关键词
Video description; Dense video captioning; Computer vision; Natural language processing;
D O I
10.1007/978-981-16-6723-7_32
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The objective of video description or dense video captioning task is to generate a description of the video content. The task consists of identifying and describing distinct temporal segments called events. Existing methods utilize relative context to obtain better sentences. In this paper, we propose a hierarchical captioning model which follows encoder-decoder scheme and consists of twoLSTMs for sentence generation. The visual and language information are encoded as context using bi-directional alteration of single-stream temporal action proposal network and is utilized in the next stage to produce coherent and contextually aware sentences. The proposed system is tested on ActivityNet captioning dataset and performed relatively better when compared with other existing approaches.
引用
收藏
页码:421 / 431
页数:11
相关论文
共 50 条
  • [1] Event-Centric Hierarchical Representation for Dense Video Captioning
    Wang, Teng
    Zheng, Huicheng
    Yu, Mingjing
    Tian, Qian
    Hu, Haifeng
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (05) : 1890 - 1900
  • [2] Hierarchical Vision-Language Alignment for Video Captioning
    Zhang, Junchao
    Peng, Yuxin
    [J]. MULTIMEDIA MODELING (MMM 2019), PT I, 2019, 11295 : 42 - 54
  • [3] Hierarchical Context-aware Network for Dense Video Event Captioning
    Ji, Lei
    Guo, Xianglin
    Huang, Haoyang
    Chen, Xilin
    [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2004 - 2013
  • [4] Survey of Dense Video Captioning
    Huang, Xiankai
    Zhang, Jiayu
    Wang, Xinyu
    Wang, Xiaochuan
    Liu, Ruijun
    [J]. Computer Engineering and Applications, 2023, 59 (12): : 28 - 48
  • [5] Streamlined Dense Video Captioning
    Mun, Jonghwan
    Yang, Linjie
    Ren, Zhou
    Xu, Ning
    Han, Bohyung
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3581 - +
  • [6] Hierarchical Global-Local Temporal Modeling for Video Captioning
    Hu, Yaosi
    Chen, Zhenzhong
    Zha, Zheng-Jun
    Wu, Feng
    [J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 774 - 783
  • [7] MULTI-MODAL HIERARCHICAL ATTENTION-BASED DENSE VIDEO CAPTIONING
    Munusamy, Hemalatha
    Sekhar, Chandra C.
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 475 - 479
  • [8] Video captioning with boundary-aware hierarchical language decoding and joint video prediction
    Shi, Xiangxi
    Cai, Jianfei
    Gu, Jiuxiang
    Joty, Shafiq
    [J]. NEUROCOMPUTING, 2020, 417 : 347 - 356
  • [9] Multimodal Pretraining for Dense Video Captioning
    Huang, Gabriel
    Pang, Bo
    Zhu, Zhenhai
    Rivera, Clara
    Soricut, Radu
    [J]. 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 470 - 490
  • [10] An Efficient Framework for Dense Video Captioning
    Suin, Maitreya
    Rajagopalan, A. N.
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12039 - 12046