Global Object Proposals for Improving Multi-Sentence Video Descriptions

被引:0
|
作者
Kanani, Chandresh S. [1 ]
Saha, Sriparna [1 ]
Bhattacharyya, Pushpak [1 ]
机构
[1] Indian Inst Technol, Dept CSE, Patna, Bihar, India
关键词
Global Objects; Video Description; ActivityNet;
D O I
10.1109/IJCNN52387.2021.9533883
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There has been significant progress in image captioning in recent years. The generation of video descriptions is still in its early stages; this is due to the complex nature of videos in comparison to images. Generating paragraph descriptions of a video is even more challenging. Amongst the main issues are temporal object dependencies and complex object-object relationships. Recently, many works are proposed on the generation of multi-sentence video descriptions. The majority of the proposed works are based on a two-step approach: 1) event proposals and 2) caption generation. While these approaches produce good results, they miss out on globally available information. Here we propose the use of global object proposals while generating the video captions. Experimental results on ActivityNet dataset illustrate that the use of global object proposals can produce more informative and correct captions. We also propose three scores to evaluate the object detection capacity of the generator. Comparison of captions generated by the proposed method and the state-of-the-art techniques proves the efficacy of the proposed method.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Adversarial Inference for Multi-Sentence Video Description
    Park, Jae Sung
    Rohrbach, Marcus
    Darrell, Trevor
    Rohrbach, Anna
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6591 - 6601
  • [2] Implicit and explicit commonsense for multi-sentence video captioning
    Chou, Shih-Han
    Little, James J.
    Sigal, Leonid
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 247
  • [3] Multi-sentence Grounding for Long-Term Instructional Video
    Li, Zeqian
    Chen, Qirui
    Han, Tengda
    Zhang, Ya
    Wang, Yanfeng
    Xie, Weidi
    COMPUTER VISION - ECCV 2024, PT LVI, 2025, 15114 : 200 - 216
  • [4] Coherent Multi-sentence Video Description with Variable Level of Detail
    Rohrbach, Anna
    Rohrbach, Marcus
    Qiu, Wei
    Friedrich, Annemarie
    Pinkal, Manfred
    Schiele, Bernt
    PATTERN RECOGNITION, GCPR 2014, 2014, 8753 : 184 - 196
  • [5] Unsupervised Rewriter for Multi-Sentence Compression
    Zhao, Yang
    Shen, Xiaoyu
    Bi, Wei
    Aizawa, Akiko
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2235 - 2240
  • [6] Parse Thicket Representation for Multi-sentence Search
    Galitsky, Boris A.
    Kuznetsov, Sergei O.
    Usikov, Daniel
    CONCEPTUAL STRUCTURES FOR STEM RESEARCH AND EDUCATION, ICCS 2013, 2013, 7735 : 153 - 172
  • [7] DOCAMR: Multi-Sentence AMR Representation and Evaluation
    Naseem, Tahira
    Blodgett, Austin
    Kumaravel, Sadhana
    O'Gorman, Tim
    Lee, Young-Suk
    Flanigan, Jeffrey
    Astudillo, Ramon Fernandez
    Florian, Radu
    Roukos, Salim
    Schneider, Nathan
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3496 - 3505
  • [8] Sentence Mover's Similarity: Automatic Evaluation for Multi-Sentence Texts
    Clark, Elizabeth
    Celikyilmaz, Asli
    Smith, Noah A.
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2748 - 2760
  • [9] Multi-sentence video captioning using spatial saliency of video frames and content-oriented beam search algorithm
    Nabati, Masoomeh
    Behrad, Alireza
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 228
  • [10] Generation of Coherent Multi-Sentence Texts with a Coherence Mechanism
    Zhao, Qingjuan
    Niu, Jianwei
    Liu, Xuefeng
    He, Wenbo
    Tang, Shaojie
    COMPUTER SPEECH AND LANGUAGE, 2023, 78