Audio-guided audiovisual data segmentation, indexing, and retrieval

被引:4
|
作者
Zhang, T [1 ]
Kuo, CCJ [1 ]
机构
[1] Univ So Calif, Integrated Media Syst Ctr, Los Angeles, CA 90089 USA
关键词
audiovisual data processing; segmentation and indexing; audio content analysis; audio-assisted video retrieval; audiovisual database; hidden Markov model;
D O I
10.1117/12.333851
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
While current approaches for video segmentation and indexing are mostly focused on visual information, audio signals may actually play a primary role in video content parsing. In this paper, we present an approach for automatic segmentation, indexing, and retrieval of audiovisual data based on audio content analysis. The accompanying audio signal of audiovisual data is first segmented and classified into basic types, i.e. speech, music, environmental sound, and silence. This coarse-level segmentation and indexing step is based on morphological and statistical analysis of several short-term features of the audio signals. Then, environmental sounds are classified into finer classes such as applause, explosion, bird's sound, etc. This fine-level classification and indexing step is based on time-frequency analysis of audio signals and the use of hidden Markov model (HMM) as the classifier. On top of this archiving scheme, an audiovisual data retrieval system is proposed. Experimental results show that the proposed approach has an accuracy rate higher than 90% for the coarse-level classification, and higher than 85% for the fine-level classification. Examples of audiovisual data segmentation and retrieval are also provided.
引用
收藏
页码:316 / 327
页数:12
相关论文
共 50 条
  • [1] A generic audio classification and segmentation approach for multimedia indexing and retrieval
    Kiranyaz, S
    Qureshi, AF
    Gabbouj, M
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (03): : 1062 - 1081
  • [2] Audio-guided blind biopsy needle placement
    Wegner, K
    Karron, DB
    [J]. MEDICINE MEETS VIRTUAL REALITY: ART, SCIENCE, TECHNOLOGY: HEALTHCARE (R)EVOLUTION TM, 1998, 50 : 90 - 95
  • [3] Audio-Guided Video-Based Face Recognition
    Tang, Xiaoou
    Li, Zhifeng
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2009, 19 (07) : 955 - 964
  • [4] Indexing and Retrieval of Audio: A Survey
    Goujun Lu
    [J]. Multimedia Tools and Applications, 2001, 15 : 269 - 290
  • [5] Indexing and retrieval of audio: A survey
    Lu, GJ
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2001, 15 (03) : 269 - 290
  • [6] Audio content analysis for online audiovisual data segmentation and classification
    Zhang, T
    Kuo, CCJ
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (04): : 441 - 457
  • [7] DISTBIC: A speaker-based segmentation for audio data indexing
    Delacourt, P
    Wellekens, CJ
    [J]. SPEECH COMMUNICATION, 2000, 32 (1-2) : 111 - 126
  • [8] Sound in media: audio drama and audio-guided tours as stimuli for the creation of place
    Wissmann, Torsten
    Zimmermann, Stefan
    [J]. GEOJOURNAL, 2015, 80 (06) : 803 - 810
  • [9] Audio-guided implicit neural representation for local image stylization
    Lee, Seung Hyun
    Kim, Sieun
    Byeon, Wonmin
    Oh, Gyeongrok
    In, Sumin
    Park, Hyeongcheol
    Yoon, Sang Ho
    Hong, Sung-Hee
    Kim, Jinkyu
    Kim, Sangpil
    [J]. COMPUTATIONAL VISUAL MEDIA, 2024,
  • [10] Audio-guided Video Interpolation via Human Pose Features
    Nakatsuka, Takayuki
    Hamanaka, Masatoshi
    Morishima, Shigeo
    [J]. PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 5: VISAPP, 2020, : 27 - 35