Video content parsing based on combined audio and visual information

被引：2

作者：

Zhang, T ^{[1
]}

Kuo, CCJ ^{[1
]}

机构：

[1] Univ So Calif, Integrated Media Syst Ctr, Los Angeles, CA 90089 USA

来源：

MULTIMEDIA STORAGE AND ARCHIVING SYSTEMS IV | 1999年 / 3846卷

关键词：

video content parsing; audio content analysis; audiovisual data segmentation and indexing; audiovisual database management; information filtering and retrieval;

D O I：

10.1117/12.360413

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

While previous research on audiovisual data segmentation and indexing primarily focuses on the pictorial part, significant clues contained in the accompanying audio flow: are often ignored. A fully functional system for video content parsing can be achieved more successfully through a proper combination of audio and visual information. By investigating the data structure of different video types, we present tools for both audio and visual content analysis and a scheme for video segmentation and annotation in this research. In the proposed system, video data are segmented into audio scenes and visual shots by detecting abrupt changes in audio and visual features, respectively. Then, the audio scene is categorized and indexed as one of the basic audio types (e.g. speech, music, song, environmental sound and speech with music background) while a visual shot is represented by keyframes and associated image features. An index table is then generated automatically for each video clip based on the integration of outputs from audio and visual analysis. It is shown that the proposed system provides satisfying video indexing results.

引用

页码：78 / 89

页数：12

共 50 条

[1] Content-based video parsing and indexing based on audio-visual interaction
Tsekeridou, S
Pitas, I
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2001, 11 (04) : 522 - 535
[2] Integration of audio and visual information for content-based video segmentation
Huang, JC
Liu, Z
Wang, Y
[J]. 1998 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL 3, 1998, : 526 - 530
[3] Content-Based Hierarchical Analysis of News Video Using Audio and Visual Information
Yu Jun qing
[J]. Wuhan University Journal of Natural Sciences, 2001, (04) : 779 - 783
[4] Content-based hierarchical analysis of news video using audio and visual information
[J]. Yu, J.-Q. (yjqing@163.net), 2001, Wuhan University (06)
[5] Combined video and audio watermarking: Embedding content information in multimedia data
Dittmann, J
Steinebach, M
Rimac, I
Fischer, S
Steinmetz, R
[J]. SECURITY AND WATERMARKING OF MULTIMEDIA CONTENTS II, 2000, 3971 : 455 - 464
[6] Audio-visual content analysis for content-based video indexing
Tsekeridou, S
Pitas, I
[J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS VOL 1, 1999, : 667 - 672
[7] Audio-visual content analysis for content-based video indexing
Tsekeridou, Sofia
Pitas, Ioannis
[J]. International Conference on Multimedia Computing and Systems -Proceedings, 1999, 1 : 667 - 672
[8] Content-based TV sports video retrieval based on audio-visual features and text information
Liu, HY
[J]. IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2004), PROCEEDINGS, 2004, : 481 - 484
[9] Toward a perceptive pretraining framework for Audio-Visual Video Parsing
Wu, Jianning
Jiang, Zhuqing
Chen, Qingchao
Wen, Shiping
Men, Aidong
Wang, Haiying
[J]. INFORMATION SCIENCES, 2022, 609 : 897 - 912
[10] Cross-Modal learning for Audio-Visual Video Parsing
Lamba, Jatin
Abhishek
Akula, Jayaprakash
Dabral, Rishabh
Jyothi, Preethi
Ramakrishnan, Ganesh
[J]. INTERSPEECH 2021, 2021, : 1937 - 1941

← 1 2 3 4 5 →