Learning Structured Video Descriptions: Automated Video Knowledge Extraction for Video Understanding Tasks

被引:1
|
作者
Daniel, Vasile [1 ]
Lukasiewicz, Thomas [1 ]
机构
[1] Univ Oxford, Dept Comp Sci, Oxford, England
基金
英国工程与自然科学研究理事会;
关键词
Structured video captioning; Video understanding; WEB;
D O I
10.1007/978-3-030-02671-4_20
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Vision to language problems, such as video annotation, or visual question answering, stand out from the perceptual video understanding tasks (e.g., classification) through their cognitive nature and their tight connection to the field of natural language processing. While most of the current solutions to vision-to-language problems are inspired from machine translation methods, aiming to directly map visual features to text, several recent results on image and video understanding have proven the importance of specifically and formally representing the semantic content of a visual scene, before reasoning over it and mapping it to natural language. This paper proposes a deep learning solution to the problem of generating structured descriptions for videos, and evaluates it on a dataset of formally annotated videos, which has been automatically generated as part of this work. The recorded results confirm the potential of the solution, indicating that it manages to describe the semantic content in a video scene with a similar accuracy to the one of state-of-the-art natural language captioning models.
引用
收藏
页码:315 / 332
页数:18
相关论文
共 50 条
  • [1] Understanding Objects in Video: Object-Oriented Video Captioning via Structured Trajectory and Adversarial Learning
    Zhu, Fangyi
    Hwang, Jenq-Neng
    Ma, Zhanyu
    Chen, Guang
    Guo, Jun
    [J]. IEEE ACCESS, 2020, 8 : 169146 - 169159
  • [2] Unified Graph Structured Models for Video Understanding
    Arnab, Anurag
    Sun, Chen
    Schmid, Cordelia
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 8097 - 8106
  • [3] Video Event Understanding using Natural Language Descriptions
    Ramanathan, Vignesh
    Liang, Percy
    Li Fei-Fei
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 905 - 912
  • [4] Leveraging Video Descriptions to Learn Video Question Answering
    Zeng, Kuo-Hao
    Chen, Tseng-Hung
    Chuang, Ching-Yao
    Liao, Yuan-Hong
    Niebles, Juan Carlos
    Sun, Min
    [J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4334 - 4340
  • [5] Deep Learning Methods for Video Understanding
    dos Santos, Gabriel N. P.
    de Freitas, Pedro V. A.
    Busson, Antonio Jose G.
    Guedes, Alan L., V
    Milidi, Ruy
    Colcher, Sergio
    [J]. WEBMEDIA 2019: PROCEEDINGS OF THE 25TH BRAZILLIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB, 2019, : 21 - 23
  • [6] Combined Application of Video Semantic Understanding Technology for Music Video Information Learning
    Liu, Songhu
    Yang, Qi
    Gong, Tianzhuo
    [J]. Computer-Aided Design and Applications, 2023, 20 (S10): : 34 - 44
  • [7] Video summarization using textual descriptions for authoring video blogs
    Otani, Mayu
    Nakashima, Yuta
    Sato, Tomokazu
    Yokoya, Naokazu
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (09) : 12097 - 12115
  • [8] Video summarization using textual descriptions for authoring video blogs
    Mayu Otani
    Yuta Nakashima
    Tomokazu Sato
    Naokazu Yokoya
    [J]. Multimedia Tools and Applications, 2017, 76 : 12097 - 12115
  • [9] HyperCon: Image-To-Video Model Transfer for Video-To-Video Translation Tasks
    Szeto, Ryan
    El-Khamy, Mostafa
    Lee, Jungwon
    Corso, Jason J.
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3079 - 3088
  • [10] Video portal for a media space of structured video streams
    Ogura, T
    Babaguchi, N
    Kitahashi, T
    [J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : A309 - A312