Audio-Based Semantic Concept Classification for Consumer Video

被引：57

作者：

Lee, Keansub ^{[1
]}

Ellis, Daniel P. W. ^{[1
]}

机构：

[1] Columbia Univ, Dept Elect Engn, Lab Recognit & Org Speech & Audio LabROSA, New York, NY 10027 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 06期

基金：

美国国家科学基金会;

关键词：

Audio classification; consumer video classification; semantic concept detection; soundtrack analysis; RETRIEVAL; MUSIC; SEGMENTATION;

D O I：

10.1109/TASL.2009.2034776

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a novel method for automatically classifying consumer video clips based on their soundtracks. We use a set of 25 overlapping semantic classes, chosen for their usefulness to users, viability of automatic detection and of annotator labeling, and sufficiency of representation in available video collections. A set of 1873 videos from real users has been annotated with these concepts. Starting with a basic representation of each video clip as a sequence of mel-frequency cepstral coefficient (MFCC) frames, we experiment with three clip-level representations: single Gaussian modeling, Gaussian mixture modeling, and probabilistic latent semantic analysis of a Gaussian component histogram. Using such summary features, we produce support vector machine (SVM) classifiers based on the Kullback-Leibler, Bhattacharyya, or Mahalanobis distance measures. Quantitative evaluation shows that our approaches are effective for detecting interesting concepts in a large collection of real-world consumer video clips.

引用

页码：1406 / 1416

页数：11

共 50 条

[1] Robust Audio-based Classification of Video Genre
Rouvier, Mickael
Linares, Georges
Matrouf, Driss
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1155 - 1158
[2] Factor Analysis for Audio-based Video Genre Classification
Rouvier, Mickael
Matrouf, Driss
Linares, Georges
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1131 - 1134
[3] Audio-Based Video Genre Identification
Rouvier, Mickael
Oger, Stanislas
Linares, Georges
Matrouf, Driss
Merialdo, Bernard
Li, Yingbo
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (06) : 1031 - 1041
[4] AUDIO-BASED NONLINEAR VIDEO DIFFUSION
Casanovas, Anna Llagostera
Vandergheynst, Pierre
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 2486 - 2489
[5] AUDIO-BASED CLASSIFICATION OF SPEAKER CHARACTERISTICS
Dutta, Promiti
Haubold, Alexander
ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 422 - 425
[6] Audio-based event detection for sports video
Baillie, M
Jose, JM
IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, 2003, 2728 : 300 - 309
[7] Combining audio-based and video-based shot classification systems for news videos segmentation
De Santo, M
Percannella, G
Sansone, C
Vento, M
MULTIPLE CLASSIFIER SYSTEMS, 2005, 3541 : 397 - 406
[8] A Survey of Audio-Based Music Classification and Annotation
Fu, Zhouyu
Lu, Guojun
Ting, Kai Ming
Zhang, Dengsheng
IEEE TRANSACTIONS ON MULTIMEDIA, 2011, 13 (02) : 303 - 319
[9] Hierarchical structure for audio-video based semantic classification of sports video sequences
Kolekar, MH
Sengupta, S
VISUAL COMMUNICATIONS AND IMAGE PROCESSING 2005, PTS 1-4, 2005, 5960 : 401 - 409
[10] Audio-Based Music Classification with DenseNet and Data Augmentation
Bian, Wenhao
Wang, Jie
Zhuang, Bojin
Yang, Jiankui
Wang, Shaojun
Xiao, Jing
PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT III, 2019, 11672 : 56 - 65

← 1 2 3 4 5 →