Multi-Perspective Video Captioning

被引：9

作者：

Bin, Yi ^{[1
]}

Shang, Xindi ^{[2
]}

Peng, Bo ^{[3
]}

Ding, Yujuan ^{[4
]}

Chua, Tat-Seng ^{[5
]}

机构：

[1] Univ Elect Sci & Technol China, Ctr Future Media, Hefei, Peoples R China

[2] Natl Univ Singapore, Singapore, Singapore

[3] Tianjin Univ, Sch Elect & Informat, Tianjin, Peoples R China

[4] Hong Kong Polytech Univ, Hong Kong, Peoples R China

[5] Natl Univ Singapore, Sea NExT Joint Lab, Singapore, Singapore

来源：

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021 | 2021年

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

multi-perspective video captioning; dataset;

D O I：

10.1145/3474085.3475173

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work targets at the problems of comprehensive video captioning and the generation of multiple descriptions from different perspectives, termed as Multi-Perspective Video Captioning. We build and release a dataset named VidOR-MPVC, the first dataset for multi-perspective video captioning, where each video is annotated with multiple descriptions from different perspectives. We also propose a novel model, dubbed perspective-aware captioner (PAC), which is capable of mining the various perspectives in a video and generating a description from each perspective. More specifically, a perspective generator is designed to perceive video content with perspective preferences, and followed by a language generator equipped with perspective-aware attention mechanism. As our new task expects to produce multiple descriptions for a video, existing evaluation metrics are fail to handle this situation. To address this problem, we devise the maximum matching scores based on existing metrics for an overall evaluation which aims to cover the aspects of semantic similarity, completeness and compactness. The experimental results demonstrate that our model is able to describe videos with multiple descriptions from different perspectives.

引用

页码：5110 / 5118

页数：9

共 50 条

[1] MPP-net: Multi-perspective perception network for dense video captioning
Wei, Yiwei
Yuan, Shaozu
Chen, Meng
Shen, Xin
Wang, Longbiao
Shen, Lei
Yan, Zhiling
[J]. NEUROCOMPUTING, 2023, 552
[2] Video analysis in a multi-perspective approach
Frederiksen, Pia
[J]. INTERNATIONAL JOURNAL OF QUALITATIVE METHODS, 2013, 12 : 856 - 857
[3] SIMULTANEOUS SPARSITY MODEL FOR MULTI-PERSPECTIVE VIDEO ANOMALY DETECTION
Mo, Xuan
Monga, Vishal
Bala, Raja
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 2314 - 2318
[4] MULTI-PERSPECTIVE ILLUMINATION
MELTON, RF
ZIMMER, RS
[J]. BRITISH JOURNAL OF EDUCATIONAL TECHNOLOGY, 1987, 18 (02) : 111 - 120
[5] Multi-perspective video analysis of persons and vehicles for enhanced situational awareness
Park, Sangho
Trivedi, Mohan M.
[J]. INTELLIGENCE AND SECURITY INFORMATICS, PROCEEDINGS, 2006, 3975 : 440 - 451
[6] Towards multi-perspective rasterization
Xuan Yu
Jingyi Yu
Leonard McMillan
[J]. The Visual Computer, 2009, 25 : 549 - 557
[7] Multi-Perspective Anomaly Detection
Jakob, Peter
Madan, Manav
Schmid-Schirling, Tobias
Valada, Abhinav
[J]. SENSORS, 2021, 21 (16)
[8] Multi-Perspective, Simultaneous Embedding
Hossain, Md Iqbal
Huroyan, Vahan
Kobourov, Stephen
Navarrete, Raymundo
[J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2021, 27 (02) : 1569 - 1579
[9] Multi-Perspective Urban Optioneering
Janssen, Patrick
Stouffs, Rudi
[J]. FUSION: DATA INTEGRATION AT ITS BEST, VOL 1, 2014, : 79 - 88
[10] Multi-spectral and multi-perspective video arrays for driver body tracking and activity analysis
Cheng, Shinko Y.
Park, Sangho
Trivedi, Mohan M.
[J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2007, 106 (2-3) : 245 - 257

← 1 2 3 4 5 →