Audio-Based Semantic Concept Classification for Consumer Video

被引：57

作者：

Lee, Keansub ^{[1
]}

Ellis, Daniel P. W. ^{[1
]}

机构：

[1] Columbia Univ, Dept Elect Engn, Lab Recognit & Org Speech & Audio LabROSA, New York, NY 10027 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 06期

基金：

美国国家科学基金会;

关键词：

Audio classification; consumer video classification; semantic concept detection; soundtrack analysis; RETRIEVAL; MUSIC; SEGMENTATION;

D O I：

10.1109/TASL.2009.2034776

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a novel method for automatically classifying consumer video clips based on their soundtracks. We use a set of 25 overlapping semantic classes, chosen for their usefulness to users, viability of automatic detection and of annotator labeling, and sufficiency of representation in available video collections. A set of 1873 videos from real users has been annotated with these concepts. Starting with a basic representation of each video clip as a sequence of mel-frequency cepstral coefficient (MFCC) frames, we experiment with three clip-level representations: single Gaussian modeling, Gaussian mixture modeling, and probabilistic latent semantic analysis of a Gaussian component histogram. Using such summary features, we produce support vector machine (SVM) classifiers based on the Kullback-Leibler, Bhattacharyya, or Mahalanobis distance measures. Quantitative evaluation shows that our approaches are effective for detecting interesting concepts in a large collection of real-world consumer video clips.

引用

页码：1406 / 1416

页数：11

共 50 条

[21] Audio-based Classification of Swirl Combustion Regimes Using Deep Learning
Roy, Rishi
Gupta, Ashwani K.
PROCEEDINGS OF ASME POWER APPLIED R&D 2023, POWER2023, 2023,
[22] Automatic Audio-Based Classification of Patient Inhaler Use: A Pharmacy Based Study
McNulty, Johnny
Reilly, Richard B.
Taylor, Terence E.
O'Dwyer, Susan M.
Costello, Richard W.
Zigel, Yaniv
2019 41ST ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2019, : 2606 - 2609
[23] An Audio-Based Deep Learning Framework For BBC Television Programme Classification
Lam Pham
Baume, Chris
Kong, Qiuqiang
Hussain, Tassadaq
Wang, Wenwu
Plumbley, Mark
29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 56 - 60
[24] Audio-based context recognition
Eronen, AJ
Peltonen, VT
Tuomi, JT
Klapuri, AP
Fagerlund, S
Sorsa, T
Lorho, G
Huopaniemi, J
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01): : 321 - 329
[25] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
Cheng, Kun
Cun, Xiaodong
Zhang, Yong
Xia, Menghan
Yin, Fei
Zhu, Mingrui
Wang, Xuan
Wang, Jue
Wang, Nannan
PROCEEDINGS SIGGRAPH ASIA 2022, 2022,
[26] Video semantic concept discovery using multimodal-based association classification
Lin, Lin
Ravitz, Guy
Shyu, Mei-Ling
Chen, Shu-Ching
2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, 2007, : 859 - +
[27] VIDEO SEMANTIC CONCEPT DETECTION VIA ASSOCIATIVE CLASSIFICATION
Lin, Lin
Shyu, Mei-Ling
Ravitz, Guy
Chen, Shu-Ching
ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 418 - +
[28] A novel fusion method for semantic concept classification in video
Tan, Li
Cao, Yuanda
Yang, Minghua
Yu, Jiong
Journal of Software, 2009, 4 (09): : 968 - 975
[29] A Large-Scale UAV Audio Dataset and Audio-Based UAV Classification Using CNN
Wang, Yaqin
Chu, Zhiwei
Ku, Ilmun
Smith, E. Cho
Matson, Eric T.
2022 SIXTH IEEE INTERNATIONAL CONFERENCE ON ROBOTIC COMPUTING, IRC, 2022, : 186 - 189
[30] Developing an Audio-based Game
Im, Byoung Uk
Baek, Nakhoon
2014 INTERNATIONAL CONFERENCE ON IT CONVERGENCE AND SECURITY (ICITCS), 2014,

← 1 2 3 4 5 →