Towards Video Captioning with Naming: A Novel Dataset and a Multi-modal Approach

被引：3

作者：

Pini, Stefano ^{[1
]}

Cornia, Marcella ^{[1
]}

Baraldi, Lorenzo ^{[1
]}

Cucchiara, Rita ^{[1
]}

机构：

[1] Univ Modena & Reggio Emilia, Dipartimento Ingn Enzo Ferrari, Modena, Italy

来源：

IMAGE ANALYSIS AND PROCESSING (ICIAP 2017), PT II | 2017年 / 10485卷

关键词：

Video captioning; Naming; Datasets; Deep learning;

D O I：

10.1007/978-3-319-68548-9_36

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Current approaches for movie description lack the ability to name characters with their proper names, and can only indicate people with a generic "someone" tag. In this paper we present two contributions towards the development of video description architectures with naming capabilities: firstly, we collect and release an extension of the popular Montreal Video Annotation Dataset in which the visual appearance of each character is linked both through time and to textual mentions in captions. We annotate, in a semi-automatic manner, a total of 53k face tracks and 29k textual mentions on 92 movies. Moreover, to underline and quantify the challenges of the task of generating captions with names, we present different multi-modal approaches to solve the problem on already generated captions.

引用

页码：384 / 395

页数：12

共 50 条

[41] OutFin, a multi-device and multi-modal dataset for outdoor localization based on the fingerprinting approach
Alhomayani, Fahad
Mahoor, Mohammad H.
SCIENTIFIC DATA, 2021, 8 (01)
[42] OutFin, a multi-device and multi-modal dataset for outdoor localization based on the fingerprinting approach
Fahad Alhomayani
Mohammad H. Mahoor
Scientific Data, 8
[43] Towards a multi-modal perceptual model
Hollier, MP
Voelcker, R
BT TECHNOLOGY JOURNAL, 1997, 15 (04): : 162 - 171
[44] Multi-Modal Multi-Action Video Recognition
Shi, Zhensheng
Liang, Ju
Li, Qianqian
Zheng, Haiyong
Gu, Zhaorui
Dong, Junyu
Zheng, Bing
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13658 - 13667
[45] A multi-modal dataset for gait recognition under occlusion
Li, Na
Zhao, Xinbo
APPLIED INTELLIGENCE, 2023, 53 (02) : 1517 - 1534
[46] MSDWILD: MULTI-MODAL SPEAKER DIARIZATION DATASET IN THE WILD
Liu, Tao
Fang, Shuai
Xiang, Xu
Song, Hongbo
Lin, Shaoxiong
Sun, Jiaqi
Han, Tianyuan
Chen, Siyuan
Yao, Binwei
Liu, Sen
Wu, Yifei
Qian, Yanmin
Yu, Kai
INTERSPEECH 2022, 2022, : 1476 - 1480
[47] SynDrone - Multi-modal UAV Dataset for Urban Scenarios
Rizzoli, Giulia
Barbato, Francesco
Caligiuri, Matteo
Zanuttigh, Pietro
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2202 - 2212
[48] MMChat: Multi-Modal Chat Dataset on Social Media
Zheng, Yinhe
Chen, Guanyi
Liu, Xin
Sun, Jian
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 5778 - 5786
[49] A multi-modal dataset for gait recognition under occlusion
Na Li
Xinbo Zhao
Applied Intelligence, 2023, 53 : 1517 - 1534
[50] A multi-modal machine learning approach towards predicting patient readmission
Mohanty, Somya D.
Lekan, Deborah
McCoy, Thomas P.
Jenkins, Marjorie
Manda, Prashanti
2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 2027 - 2035

← 1 2 3 4 5 →