Reducing the cost of metadata generation by using video/audio indexing and natural language processing techniques

被引:0
|
作者
Kuwano, Hidetaka [1 ]
Matsuo, Yoshihiro [2 ]
Kawazoe, Katsuhiko [2 ]
机构
[1] NTT Cyber Solutions Laboratories, Yokosuka-shi, 239-0947, Japan
[2] NTT Cyber Solutions Laboratories
来源
NTT Technical Review | 2004年 / 2卷 / 08期
关键词
Character recognition - Costs - Image segmentation - Indexing (of information) - Natural language processing systems - Network protocols - Semantics - Speech recognition - Television broadcasting - User interfaces;
D O I
暂无
中图分类号
学科分类号
摘要
Reducing the cost of generating metadata will allow more broadcast contents to be transmitted with advanced viewing options. In this article, we describe SceneCabinet, a system that automatically extracts scene-based semantic metadata from video content. It extracts meaningful video slices and their associated textual information such as the title, synopsis, and keywords by using natural language processing based on the results of speech and on-screen text recognition. Moreover, it can import video program scripts and use them for automatic keyword extraction. SceneCabinet provides an intuitive user operation interface including a video browser with key images detected automatically based on scene changes, on-screen text, camerawork, speech, and music information. Experiments showed that SceneCabinet can significantly reduce metadata generation costs.
引用
收藏
页码:68 / 74
相关论文
共 50 条
  • [1] Effects of task-cost reduction on metadata generation using audio/visual indexing and natural-language processing techniques
    Kuwano, Hidetaka
    Matsuo, Yoshihiro
    Kawazoe, Katsuhiko
    [J]. Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers, 2007, 61 (06): : 842 - 852
  • [2] Semantic Hierarchical Indexing for Online Video Lessons Using Natural Language Processing
    Arazzi, Marco
    Ferretti, Marco
    Nocera, Antonino
    [J]. BIG DATA AND COGNITIVE COMPUTING, 2023, 7 (02)
  • [3] Summary Generation Using Natural Language Processing Techniques and Cosine Similarity
    Pal, Sayantan
    Chang, Maiga
    Iriarte, Maria Fernandez
    [J]. INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, ISDA 2021, 2022, 418 : 508 - 517
  • [4] Indexing audio-visual sequences by joint audio and video processing
    Saraceno, C
    Leonardi, R
    [J]. VSMM98: FUTUREFUSION - APPLICATION REALITIES FOR THE VIRTUAL AGE, VOLS 1 AND 2, 1998, : 686 - 691
  • [5] Video indexing using speech recognition techniques in audio channel preliminary system design
    Gu, LY
    [J]. Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, 2004, : 342 - 345
  • [6] Indexing audiovisual databases through joint audio and video processing
    Saraceno, C
    Leonardi, R
    [J]. INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 1998, 9 (05) : 320 - 331
  • [7] Automated Extraction of Semantic Legal Metadata Using Natural Language Processing
    Sleimi, Amin
    Sannier, Nicolas
    Sabetzadeh, Mehrdad
    Briand, Lionel C.
    Dann, John
    [J]. 2018 IEEE 26TH INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE (RE 2018), 2018, : 124 - 135
  • [8] Generation of Oracles using Natural Language Processing
    Leong, Iat Tou
    Barbosa, Raul
    [J]. 2021 28TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE WORKSHOPS (APSECW 2021), 2021, : 25 - 31
  • [9] Content-based Recommendation for Podcast Audio-items using Natural Language Processing Techniques
    Xing, Zhou
    Parandehgheibi, Marzieh
    Xiao, Fei
    Kulkarni, Nilesh
    Pouliot, Chris
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 2378 - 2383
  • [10] Food Recipe Alternation and Generation with Natural Language Processing Techniques
    Pan, Yuran
    Xu, Qiangwen
    Li, Yanjun
    [J]. 2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW 2020), 2020, : 94 - 97