Global Object Proposals for Improving Multi-Sentence Video Descriptions

被引:0
|
作者
Kanani, Chandresh S. [1 ]
Saha, Sriparna [1 ]
Bhattacharyya, Pushpak [1 ]
机构
[1] Indian Inst Technol, Dept CSE, Patna, Bihar, India
关键词
Global Objects; Video Description; ActivityNet;
D O I
10.1109/IJCNN52387.2021.9533883
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There has been significant progress in image captioning in recent years. The generation of video descriptions is still in its early stages; this is due to the complex nature of videos in comparison to images. Generating paragraph descriptions of a video is even more challenging. Amongst the main issues are temporal object dependencies and complex object-object relationships. Recently, many works are proposed on the generation of multi-sentence video descriptions. The majority of the proposed works are based on a two-step approach: 1) event proposals and 2) caption generation. While these approaches produce good results, they miss out on globally available information. Here we propose the use of global object proposals while generating the video captions. Experimental results on ActivityNet dataset illustrate that the use of global object proposals can produce more informative and correct captions. We also propose three scores to evaluate the object detection capacity of the generator. Comparison of captions generated by the proposed method and the state-of-the-art techniques proves the efficacy of the proposed method.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] Saliency Detection for Improving Object Proposals
    Chen, Shuhan
    Li, Jindong
    Hu, Xuelong
    Zhou, Ping
    2016 IEEE INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2016, : 462 - 466
  • [32] Construction risk identification using a multi-sentence context-aware method
    Gao, Nan
    Touran, Ali
    Wang, Qi
    Beauchamp, Nicholas
    AUTOMATION IN CONSTRUCTION, 2024, 164
  • [33] Unsupervised Abstractive Meeting Summarization with Multi-Sentence Compression and Budgeted Submodular Maximization
    Shang, Guokan
    Ding, Wensi
    Zhang, Zekun
    Tixier, Antoine J. -P.
    Meladianos, Polykarpos
    Vazirgiannis, Michalis
    Lorre, Jean-Pierre
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 664 - 674
  • [34] Show and Tell More: Topic-Oriented Multi-Sentence Image Captioning
    Mao, Yuzhao
    Zhou, Chang
    Wang, Xiaojie
    Li, Ruifan
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 4258 - 4264
  • [35] Reading during the composition of multi-sentence texts: an eye-movement study
    Torrance, Mark
    Johansson, Roger
    Johansson, Victoria
    Wengelin, Asa
    PSYCHOLOGICAL RESEARCH-PSYCHOLOGISCHE FORSCHUNG, 2016, 80 (05): : 729 - 743
  • [36] Multi-Sentence Matching via Exploiting List-level Semantics Expansion
    Sun, Ruijun
    Li, Zhi
    Liu, Qi
    Wang, Zhefeng
    Duan, Xinyu
    Huai, Baoxing
    Yuan, Nicholas Jing
    2022 IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH (ICKG), 2022, : 249 - 256
  • [37] Reading during the composition of multi-sentence texts: an eye-movement study
    Mark Torrance
    Roger Johansson
    Victoria Johansson
    Åsa Wengelin
    Psychological Research, 2016, 80 : 729 - 743
  • [38] Graph Transduction Learning of Object Proposals for Video Object Segmentation
    Wang, Tinghuai
    Wang, Huiling
    COMPUTER VISION - ACCV 2014, PT IV, 2015, 9006 : 553 - 568
  • [39] Constrained BERT BiLSTM CRF for understanding multi-sentence entity-seeking questions
    Contractor, Danish
    Patra, Barun
    Mausam
    Singla, Parag
    NATURAL LANGUAGE ENGINEERING, 2021, 27 (01) : 65 - 87
  • [40] Fully Connected Object Proposals for Video Segmentation
    Perazzi, Federico
    Wang, Oliver
    Gross, Markus
    Sorkine-Hornung, Alexander
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 3227 - 3234