A framework for creating natural language descriptions of video streams

被引:5
|
作者
Khan, Muhammad Usman Ghani [1 ]
Al Harbi, Nouf [2 ]
Gotoh, Yoshihiko [2 ]
机构
[1] Univ Engn & Technol, Dept Comp Sci, Lahore, Pakistan
[2] Univ Sheffield, Dept Comp Sci, Sheffield S10 2TN, S Yorkshire, England
关键词
Video retrieval; Video annotation; Natural language generation; FACE DETECTION; RECOGNITION;
D O I
10.1016/j.ins.2014.12.034
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This contribution addresses generation of natural language descriptions for important visual content present in video streams. The work starts with implementation of conventional image processing techniques to extract high,level visual features such as humans and their activities. These features are converted into natural language descriptions using a template-based approach built on a context free grammar, incorporating spatial and temporal information. The task is challenging particularly because feature extraction processes are erroneous at various levels. In this paper we explore approaches to accommodating potentially missing information, thus creating a coherent description. Sample automatic annotations are created for video clips presenting humans' close-ups and actions, and qualitative analysis of the approach is made from various aspects. Additionally a task-based scheme is introduced that provides quantitative evaluation for relevance of generated descriptions. Further, to show the framework's potential for extension, a scalability study is conducted using video categories that are not targeted during the development. (C) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:61 / 82
页数:22
相关论文
共 50 条
  • [1] Translating Video Content to Natural Language Descriptions
    Rohrbach, Marcus
    Qiu, Wei
    Titov, Ivan
    Thater, Stefan
    Pinkal, Manfred
    Schiele, Bernt
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 433 - 440
  • [2] Video Event Understanding using Natural Language Descriptions
    Ramanathan, Vignesh
    Liang, Percy
    Li Fei-Fei
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 905 - 912
  • [3] Towards Coherent Natural Language Description of Video Streams
    Khan, Muhammad Usman Ghani
    Zhang, Lei
    Gotoh, Yoshihiko
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCV WORKSHOPS), 2011,
  • [4] GENERATING COHERENT NATURAL LANGUAGE ANNOTATIONS FOR VIDEO STREAMS
    Khan, Muhammad Usman Ghani
    Zhang, Lei
    Gotoh, Yoshihiko
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 2893 - 2896
  • [5] Natural language descriptions of human Behavior from video sequences
    Tena, Carles Fernandez
    Baiget, Pau
    Roca, Xavier
    Gonzalez, Jordi
    [J]. KI 2007: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2007, 4667 : 279 - +
  • [6] Conceptual representations between video signals and natural language descriptions
    Arens, M.
    Gerber, R.
    Nagel, H. -H.
    [J]. IMAGE AND VISION COMPUTING, 2008, 26 (01) : 53 - 66
  • [7] Extracting Information for Creating SAPPhIRE Model of Causality from Natural Language Descriptions
    Bhattacharya, Kausik
    Bhatt, Apoory Naresh
    Ranjan, B. S. C.
    Keshwani, Sonal
    Srinivasan, V
    Chakrabarti, Amaresh
    [J]. DESIGN COMPUTING AND COGNITION'22, 2023, : 3 - 20
  • [8] A framework for learning semantic maps from grounded natural language descriptions
    Walter, Matthew R.
    Hemachandra, Sachithra
    Homberg, Bianca
    Tellex, Stefanie
    Teller, Seth
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2014, 33 (09): : 1167 - 1190
  • [9] Automatically Generating Natural Language Descriptions of Images by a Deep Hierarchical Framework
    Huo, Lin
    Bai, Lin
    Zhou, Shang-Ming
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (08) : 7441 - 7452
  • [10] Natural Language Description of Video Streams Using Task-Specific Feature Encoding
    Dilawari, Aniqa
    Khan, Muhammad Usman Ghani
    Farooq, Ammarah
    Zahoor-Ur-Rehman
    Rho, Seungmin
    Mehmood, Irfan
    [J]. IEEE ACCESS, 2018, 6 : 16639 - 16645