A framework for creating natural language descriptions of video streams

被引:5
|
作者
Khan, Muhammad Usman Ghani [1 ]
Al Harbi, Nouf [2 ]
Gotoh, Yoshihiko [2 ]
机构
[1] Univ Engn & Technol, Dept Comp Sci, Lahore, Pakistan
[2] Univ Sheffield, Dept Comp Sci, Sheffield S10 2TN, S Yorkshire, England
关键词
Video retrieval; Video annotation; Natural language generation; FACE DETECTION; RECOGNITION;
D O I
10.1016/j.ins.2014.12.034
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This contribution addresses generation of natural language descriptions for important visual content present in video streams. The work starts with implementation of conventional image processing techniques to extract high,level visual features such as humans and their activities. These features are converted into natural language descriptions using a template-based approach built on a context free grammar, incorporating spatial and temporal information. The task is challenging particularly because feature extraction processes are erroneous at various levels. In this paper we explore approaches to accommodating potentially missing information, thus creating a coherent description. Sample automatic annotations are created for video clips presenting humans' close-ups and actions, and qualitative analysis of the approach is made from various aspects. Additionally a task-based scheme is introduced that provides quantitative evaluation for relevance of generated descriptions. Further, to show the framework's potential for extension, a scalability study is conducted using video categories that are not targeted during the development. (C) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:61 / 82
页数:22
相关论文
共 50 条
  • [31] Skimming, Locating, then Perusing: A Human-Like Framework for Natural Language Video Localization
    Liu, Daizong
    Hu, Wei
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4536 - 4545
  • [32] Natural Language Video Localization: A Revisit in Span-Based Question Answering Framework
    Zhang, Hao
    Sun, Aixin
    Jing, Wei
    Zhen, Liangli
    Zhou, Joey Tianyi
    Goh, Rick Siow Mong
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (08) : 4252 - 4266
  • [33] RCVS: A Unified Registration and Fusion Framework for Video Streams
    Xie, Housheng
    Sang, Meng
    Zhang, Yukuan
    Yang, Yang
    Zhao, Shan
    Zhong, Jianbo
    [J]. IEEE Transactions on Multimedia, 2024, 26 : 11031 - 11043
  • [34] EDUCATIONAL FUNCTION OF CREATING A STUDENT VIDEO IN A FOREIGN LANGUAGE
    Zhabo, N.
    Avdonina, M.
    Likhacheva, I.
    Grigoryan, N.
    Bykova, I.
    [J]. 9TH INTERNATIONAL CONFERENCE ON EDUCATION AND NEW LEARNING TECHNOLOGIES (EDULEARN17), 2017, : 1555 - 1560
  • [35] CAPP USING NATURAL-LANGUAGE PART DESCRIPTIONS
    MASON, AK
    OKHUYSEN, GA
    [J]. JOURNAL OF SYSTEMS ENGINEERING, 1995, 5 (01): : 27 - 35
  • [36] Behavioural interpretation of natural language descriptions in virtual environments
    Cavazza, M
    Palmer, I
    [J]. VSMM98: FUTUREFUSION - APPLICATION REALITIES FOR THE VIRTUAL AGE, VOLS 1 AND 2, 1998, : 475 - 480
  • [37] Analyzing the Gap Between Workflows and their Natural Language Descriptions
    Groth, Paul
    Gil, Yolanda
    [J]. 2009 IEEE CONGRESS ON SERVICES (SERVICES-1 2009), VOLS 1 AND 2, 2009, : 299 - 305
  • [38] Leveraging a Corpus of Natural Language Descriptions for Program Similarity
    Zilberstein, Meital
    Yahav, Eran
    [J]. ONWARD!'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL SYMPOSIUM ON NEW IDEAS, NEW PARADIGMS, AND REFLECTIONS ON PROGRAMMING AND SOFTWARE, 2016, : 197 - 211
  • [39] Visualization of temporal and spatial information in natural language descriptions
    Kyushu Inst of Technology, Iizuka-shi, Japan
    [J]. IEICE Trans Inf Syst, 5 (591-599):
  • [40] Visualization of temporal and spatial information in natural language descriptions
    Baba, H
    Noma, T
    Okada, N
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1996, E79D (05) : 591 - 599