Video content parsing based on combined audio and visual information

被引:2
|
作者
Zhang, T [1 ]
Kuo, CCJ [1 ]
机构
[1] Univ So Calif, Integrated Media Syst Ctr, Los Angeles, CA 90089 USA
关键词
video content parsing; audio content analysis; audiovisual data segmentation and indexing; audiovisual database management; information filtering and retrieval;
D O I
10.1117/12.360413
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
While previous research on audiovisual data segmentation and indexing primarily focuses on the pictorial part, significant clues contained in the accompanying audio flow: are often ignored. A fully functional system for video content parsing can be achieved more successfully through a proper combination of audio and visual information. By investigating the data structure of different video types, we present tools for both audio and visual content analysis and a scheme for video segmentation and annotation in this research. In the proposed system, video data are segmented into audio scenes and visual shots by detecting abrupt changes in audio and visual features, respectively. Then, the audio scene is categorized and indexed as one of the basic audio types (e.g. speech, music, song, environmental sound and speech with music background) while a visual shot is represented by keyframes and associated image features. An index table is then generated automatically for each video clip based on the integration of outputs from audio and visual analysis. It is shown that the proposed system provides satisfying video indexing results.
引用
收藏
页码:78 / 89
页数:12
相关论文
共 50 条
  • [1] Content-based video parsing and indexing based on audio-visual interaction
    Tsekeridou, S
    Pitas, I
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2001, 11 (04) : 522 - 535
  • [2] Integration of audio and visual information for content-based video segmentation
    Huang, JC
    Liu, Z
    Wang, Y
    [J]. 1998 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL 3, 1998, : 526 - 530
  • [5] Combined video and audio watermarking: Embedding content information in multimedia data
    Dittmann, J
    Steinebach, M
    Rimac, I
    Fischer, S
    Steinmetz, R
    [J]. SECURITY AND WATERMARKING OF MULTIMEDIA CONTENTS II, 2000, 3971 : 455 - 464
  • [6] Audio-visual content analysis for content-based video indexing
    Tsekeridou, S
    Pitas, I
    [J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS VOL 1, 1999, : 667 - 672
  • [7] Audio-visual content analysis for content-based video indexing
    Tsekeridou, Sofia
    Pitas, Ioannis
    [J]. International Conference on Multimedia Computing and Systems -Proceedings, 1999, 1 : 667 - 672
  • [8] Content-based TV sports video retrieval based on audio-visual features and text information
    Liu, HY
    [J]. IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2004), PROCEEDINGS, 2004, : 481 - 484
  • [9] Toward a perceptive pretraining framework for Audio-Visual Video Parsing
    Wu, Jianning
    Jiang, Zhuqing
    Chen, Qingchao
    Wen, Shiping
    Men, Aidong
    Wang, Haiying
    [J]. INFORMATION SCIENCES, 2022, 609 : 897 - 912
  • [10] Cross-Modal learning for Audio-Visual Video Parsing
    Lamba, Jatin
    Abhishek
    Akula, Jayaprakash
    Dabral, Rishabh
    Jyothi, Preethi
    Ramakrishnan, Ganesh
    [J]. INTERSPEECH 2021, 2021, : 1937 - 1941