Audio-Based Semantic Concept Classification for Consumer Video

被引:57
|
作者
Lee, Keansub [1 ]
Ellis, Daniel P. W. [1 ]
机构
[1] Columbia Univ, Dept Elect Engn, Lab Recognit & Org Speech & Audio LabROSA, New York, NY 10027 USA
基金
美国国家科学基金会;
关键词
Audio classification; consumer video classification; semantic concept detection; soundtrack analysis; RETRIEVAL; MUSIC; SEGMENTATION;
D O I
10.1109/TASL.2009.2034776
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a novel method for automatically classifying consumer video clips based on their soundtracks. We use a set of 25 overlapping semantic classes, chosen for their usefulness to users, viability of automatic detection and of annotator labeling, and sufficiency of representation in available video collections. A set of 1873 videos from real users has been annotated with these concepts. Starting with a basic representation of each video clip as a sequence of mel-frequency cepstral coefficient (MFCC) frames, we experiment with three clip-level representations: single Gaussian modeling, Gaussian mixture modeling, and probabilistic latent semantic analysis of a Gaussian component histogram. Using such summary features, we produce support vector machine (SVM) classifiers based on the Kullback-Leibler, Bhattacharyya, or Mahalanobis distance measures. Quantitative evaluation shows that our approaches are effective for detecting interesting concepts in a large collection of real-world consumer video clips.
引用
收藏
页码:1406 / 1416
页数:11
相关论文
共 50 条
  • [31] Multimodal Video Concept Classification based on Convolutional Neural Network and Audio Feature Combination
    Selbes, Berkay
    Sert, Mustafa
    2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
  • [32] Audio-Visual Atoms for Generic Video Concept Classification
    Jiang, Wei
    Cotton, Courtenay
    Chang, Shih-Fu
    Ellis, Dan
    Loui, Alexander C.
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2010, 6 (03)
  • [33] Genre-Adaptive Semantic Computing and Audio-Based Modelling for Music Mood Annotation
    Saari, Pasi
    Fazekas, Gyorgy
    Eerola, Tuomas
    Barthet, Mathieu
    Lartillot, Olivier
    Sandler, Mark
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2016, 7 (02) : 122 - 135
  • [34] A 15-Category Audio Dataset for Drones and an Audio-Based UAV Classification Using Machine Learning
    Wang, Mia Yaqin
    Chu, Zhiwei
    Ku, Ilmun
    Smith, E. Cho
    Matson, Eric T.
    INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2024, 18 (02) : 257 - 272
  • [35] Adaptive Audio-Based Context Recognition
    Dargie, Waltenegus
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2009, 39 (04): : 715 - 725
  • [36] Visual music transcription of clarinet video recordings trained with audio-based labelled data
    Zinemanas, Pablo
    Arias, Pablo
    Haro, Gloria
    Gomez, Emilia
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 463 - 470
  • [37] Audio Surveillance: Detection of Audio-Based Emergency Situations
    Dosbayev, Zhandos
    Abdrakhmanov, Rustam
    Akhmetova, Oxana
    Nurtas, Marat
    Iztayev, Zhalgasbek
    Zhaidakbaeva, Lyazzat
    Shaimerdenova, Lazzat
    ADVANCES IN COMPUTATIONAL COLLECTIVE INTELLIGENCE (ICCCI 2021), 2021, 1463 : 413 - 424
  • [38] Audio-Based Epileptic Seizure Detection
    Ahsan, M. N. Istiaq
    Kertesz, Csaba
    Mesaros, Annamaria
    Heittola, Toni
    Knight, Andrew
    Virtanen, Tuomas
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [39] Sound Event Classification with Feature Vector Combination for Automatic Audio-based Surveillance
    Lee, Seunghyung
    Park, Jinuk
    Park, Sangjun
    Hahn, Minsoo
    2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2016,
  • [40] EXPLORING META INFORMATION FOR AUDIO-BASED ZERO-SHOT BIRD CLASSIFICATION
    Gebhard, Alexander
    Triantafyllopoulos, Andreas
    Bez, Teresa
    Christ, Lukas
    Kathan, Alexander
    Schuller, Bjoern W.
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 1211 - 1215