Multi-Perspective Video Captioning

被引:9
|
作者
Bin, Yi [1 ]
Shang, Xindi [2 ]
Peng, Bo [3 ]
Ding, Yujuan [4 ]
Chua, Tat-Seng [5 ]
机构
[1] Univ Elect Sci & Technol China, Ctr Future Media, Hefei, Peoples R China
[2] Natl Univ Singapore, Singapore, Singapore
[3] Tianjin Univ, Sch Elect & Informat, Tianjin, Peoples R China
[4] Hong Kong Polytech Univ, Hong Kong, Peoples R China
[5] Natl Univ Singapore, Sea NExT Joint Lab, Singapore, Singapore
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
multi-perspective video captioning; dataset;
D O I
10.1145/3474085.3475173
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work targets at the problems of comprehensive video captioning and the generation of multiple descriptions from different perspectives, termed as Multi-Perspective Video Captioning. We build and release a dataset named VidOR-MPVC, the first dataset for multi-perspective video captioning, where each video is annotated with multiple descriptions from different perspectives. We also propose a novel model, dubbed perspective-aware captioner (PAC), which is capable of mining the various perspectives in a video and generating a description from each perspective. More specifically, a perspective generator is designed to perceive video content with perspective preferences, and followed by a language generator equipped with perspective-aware attention mechanism. As our new task expects to produce multiple descriptions for a video, existing evaluation metrics are fail to handle this situation. To address this problem, we devise the maximum matching scores based on existing metrics for an overall evaluation which aims to cover the aspects of semantic similarity, completeness and compactness. The experimental results demonstrate that our model is able to describe videos with multiple descriptions from different perspectives.
引用
收藏
页码:5110 / 5118
页数:9
相关论文
共 50 条
  • [1] MPP-net: Multi-perspective perception network for dense video captioning
    Wei, Yiwei
    Yuan, Shaozu
    Chen, Meng
    Shen, Xin
    Wang, Longbiao
    Shen, Lei
    Yan, Zhiling
    [J]. NEUROCOMPUTING, 2023, 552
  • [2] Video analysis in a multi-perspective approach
    Frederiksen, Pia
    [J]. INTERNATIONAL JOURNAL OF QUALITATIVE METHODS, 2013, 12 : 856 - 857
  • [3] SIMULTANEOUS SPARSITY MODEL FOR MULTI-PERSPECTIVE VIDEO ANOMALY DETECTION
    Mo, Xuan
    Monga, Vishal
    Bala, Raja
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 2314 - 2318
  • [4] MULTI-PERSPECTIVE ILLUMINATION
    MELTON, RF
    ZIMMER, RS
    [J]. BRITISH JOURNAL OF EDUCATIONAL TECHNOLOGY, 1987, 18 (02) : 111 - 120
  • [5] Multi-perspective video analysis of persons and vehicles for enhanced situational awareness
    Park, Sangho
    Trivedi, Mohan M.
    [J]. INTELLIGENCE AND SECURITY INFORMATICS, PROCEEDINGS, 2006, 3975 : 440 - 451
  • [6] Towards multi-perspective rasterization
    Xuan Yu
    Jingyi Yu
    Leonard McMillan
    [J]. The Visual Computer, 2009, 25 : 549 - 557
  • [7] Multi-Perspective Anomaly Detection
    Jakob, Peter
    Madan, Manav
    Schmid-Schirling, Tobias
    Valada, Abhinav
    [J]. SENSORS, 2021, 21 (16)
  • [8] Multi-Perspective, Simultaneous Embedding
    Hossain, Md Iqbal
    Huroyan, Vahan
    Kobourov, Stephen
    Navarrete, Raymundo
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2021, 27 (02) : 1569 - 1579
  • [9] Multi-Perspective Urban Optioneering
    Janssen, Patrick
    Stouffs, Rudi
    [J]. FUSION: DATA INTEGRATION AT ITS BEST, VOL 1, 2014, : 79 - 88
  • [10] Multi-spectral and multi-perspective video arrays for driver body tracking and activity analysis
    Cheng, Shinko Y.
    Park, Sangho
    Trivedi, Mohan M.
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2007, 106 (2-3) : 245 - 257