The Role of the Input in Natural Language Video Description

被引:2
|
作者
Cascianelli, Silvia [1 ]
Costante, Gabriele [1 ]
Devo, Alessandro [1 ]
Ciarfuglia, Thomas A. [1 ]
Valigi, Paolo [1 ]
Fravolini, Mario L. [1 ]
机构
[1] Univ Perugia, Dept Engn, I-06123 Perugia, Italy
关键词
Video description; multimodal data; input preprocessing; IMAGE; ATTENTION; TEXT;
D O I
10.1109/TMM.2019.2924598
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Natural language video description (NLVD) has recently received strong interest in the computer vision, natural language processing (NLP), multimedia, and autonomous robotics communities. The state-of-the-art (SotA) approaches obtained remarkable results when tested on the benchmark datasets. However, those approaches poorly generalize to new datasets. In addition, none of the existing works focus on the processing of the input to the NLVD systems, which is both visual and textual. In this paper, an extensive study is presented to deal with the role of the visual input, evaluated with respect to the overall NLP performance. This is achieved by performing data augmentation of the visual component, applying common transformations to model camera distortions, noise, lighting, and camera positioning that are typical in real-world operative scenarios. A t-SNE-based analysis is proposed to evaluate the effects of the considered transformations on the overall visual data distribution. For this study, the English subset of the Microsoft Research Video Description (MSVD) dataset is considered, which is used commonly for NLVD. It was observed that this dataset contains a relevant amount of syntactic and semantic errors. These errors have been amended manually, and the new version of the dataset (called MSVD-v2) is used in the experimentation. The MSVD-v2 dataset is released to help to gain insight into the NLVD problem.
引用
收藏
页码:271 / 283
页数:13
相关论文
共 50 条
  • [21] The Applications of Description Logics in Natural Language Processing
    Cheng Xian-Yi
    Cheng Chen
    Zhu Qian
    ADVANCED RESEARCH ON INDUSTRY, INFORMATION SYSTEMS AND MATERIAL ENGINEERING, PTS 1-7, 2011, 204-210 : 381 - +
  • [22] Natural Language Description of Videos for Smart Surveillance
    Dilawari, Aniqa
    Khan, Muhammad Usman Ghani
    Al-Otaibi, Yasser D.
    Rehman, Zahoor-ur
    Rahman, Atta-ur
    Nam, Yunyoung
    APPLIED SCIENCES-BASEL, 2021, 11 (09):
  • [23] The Applications of Description Logics in Natural Language Processing
    Cheng Xian-Yi
    Cheng Chen
    Zhu Qian
    ADVANCED MATERIALS SCIENCE AND TECHNOLOGY, PTS 1-2, 2011, 181-182 : 236 - +
  • [24] Natural language agreement description for reversible grammars
    Diaconescu, S
    AI 2003: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2003, 2903 : 161 - 172
  • [25] MSR-VTT: A Large Video Description Dataset for Bridging Video and Language
    Xu, Jun
    Mei, Tao
    Yao, Ting
    Rui, Yong
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 5288 - 5296
  • [26] ExpressEdit: Video Editing with Natural Language and Sketching
    Tilekbay, Bekzat
    Yang, Saelyne
    Lewkowicz, Michal
    Suryapranata, Alex
    Kim, Juho
    COMPANION PROCEEDINGS OF 2024 29TH ANNUAL CONFERENCE ON INTELLIGENT USER INTERFACES, IUI 2024 COMPANION, 2024, : 50 - 53
  • [27] ExpressEdit: Video Editing with Natural Language and Sketching
    Tilekbay, Bekzat
    Yang, Saelyne
    Lewkowicz, Michal
    Suryapranata, Alex
    Kim, Juho
    PROCEEDINGS OF 2024 29TH ANNUAL CONFERENCE ON INTELLIGENT USER INTERFACES, IUI 2024, 2024, : 515 - 536
  • [28] Translating Video Content to Natural Language Descriptions
    Rohrbach, Marcus
    Qiu, Wei
    Titov, Ivan
    Thater, Stefan
    Pinkal, Manfred
    Schiele, Bernt
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 433 - 440
  • [29] Mapping language to the world: the role of iconicity in the sign language input
    Perniss, Pamela
    Lu, Jenny C.
    Morgan, Gary
    Vigliocco, Gabriella
    DEVELOPMENTAL SCIENCE, 2018, 21 (02)
  • [30] Natural language analysis of written description of impressions of science and language subjects
    Shimoda, Hiroko
    Okamoto, Vuji
    Fukuyama, Hidenao
    Matsuyama, Takashi
    Takahashi, Ryosuke
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2008, 43 (3-4) : 297 - 297