Audio-Based Semantic Concept Classification for Consumer Video

被引：57

作者：

Lee, Keansub ^{[1
]}

Ellis, Daniel P. W. ^{[1
]}

机构：

[1] Columbia Univ, Dept Elect Engn, Lab Recognit & Org Speech & Audio LabROSA, New York, NY 10027 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 06期

基金：

美国国家科学基金会;

关键词：

Audio classification; consumer video classification; semantic concept detection; soundtrack analysis; RETRIEVAL; MUSIC; SEGMENTATION;

D O I：

10.1109/TASL.2009.2034776

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a novel method for automatically classifying consumer video clips based on their soundtracks. We use a set of 25 overlapping semantic classes, chosen for their usefulness to users, viability of automatic detection and of annotator labeling, and sufficiency of representation in available video collections. A set of 1873 videos from real users has been annotated with these concepts. Starting with a basic representation of each video clip as a sequence of mel-frequency cepstral coefficient (MFCC) frames, we experiment with three clip-level representations: single Gaussian modeling, Gaussian mixture modeling, and probabilistic latent semantic analysis of a Gaussian component histogram. Using such summary features, we produce support vector machine (SVM) classifiers based on the Kullback-Leibler, Bhattacharyya, or Mahalanobis distance measures. Quantitative evaluation shows that our approaches are effective for detecting interesting concepts in a large collection of real-world consumer video clips.

引用

页码：1406 / 1416

页数：11

共 50 条

[31] Multimodal Video Concept Classification based on Convolutional Neural Network and Audio Feature Combination
Selbes, Berkay
Sert, Mustafa
2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
[32] Audio-Visual Atoms for Generic Video Concept Classification
Jiang, Wei
Cotton, Courtenay
Chang, Shih-Fu
Ellis, Dan
Loui, Alexander C.
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2010, 6 (03)
[33] Genre-Adaptive Semantic Computing and Audio-Based Modelling for Music Mood Annotation
Saari, Pasi
Fazekas, Gyorgy
Eerola, Tuomas
Barthet, Mathieu
Lartillot, Olivier
Sandler, Mark
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2016, 7 (02) : 122 - 135
[34] A 15-Category Audio Dataset for Drones and an Audio-Based UAV Classification Using Machine Learning
Wang, Mia Yaqin
Chu, Zhiwei
Ku, Ilmun
Smith, E. Cho
Matson, Eric T.
INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2024, 18 (02) : 257 - 272
[35] Adaptive Audio-Based Context Recognition
Dargie, Waltenegus
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2009, 39 (04): : 715 - 725
[36] Visual music transcription of clarinet video recordings trained with audio-based labelled data
Zinemanas, Pablo
Arias, Pablo
Haro, Gloria
Gomez, Emilia
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 463 - 470
[37] Audio Surveillance: Detection of Audio-Based Emergency Situations
Dosbayev, Zhandos
Abdrakhmanov, Rustam
Akhmetova, Oxana
Nurtas, Marat
Iztayev, Zhalgasbek
Zhaidakbaeva, Lyazzat
Shaimerdenova, Lazzat
ADVANCES IN COMPUTATIONAL COLLECTIVE INTELLIGENCE (ICCCI 2021), 2021, 1463 : 413 - 424
[38] Audio-Based Epileptic Seizure Detection
Ahsan, M. N. Istiaq
Kertesz, Csaba
Mesaros, Annamaria
Heittola, Toni
Knight, Andrew
Virtanen, Tuomas
2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
[39] Sound Event Classification with Feature Vector Combination for Automatic Audio-based Surveillance
Lee, Seunghyung
Park, Jinuk
Park, Sangjun
Hahn, Minsoo
2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2016,
[40] EXPLORING META INFORMATION FOR AUDIO-BASED ZERO-SHOT BIRD CLASSIFICATION
Gebhard, Alexander
Triantafyllopoulos, Andreas
Bez, Teresa
Christ, Lukas
Kathan, Alexander
Schuller, Bjoern W.
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 1211 - 1215

← 1 2 3 4 5 →