Audio-Based Semantic Concept Classification for Consumer Video

被引:57
|
作者
Lee, Keansub [1 ]
Ellis, Daniel P. W. [1 ]
机构
[1] Columbia Univ, Dept Elect Engn, Lab Recognit & Org Speech & Audio LabROSA, New York, NY 10027 USA
基金
美国国家科学基金会;
关键词
Audio classification; consumer video classification; semantic concept detection; soundtrack analysis; RETRIEVAL; MUSIC; SEGMENTATION;
D O I
10.1109/TASL.2009.2034776
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a novel method for automatically classifying consumer video clips based on their soundtracks. We use a set of 25 overlapping semantic classes, chosen for their usefulness to users, viability of automatic detection and of annotator labeling, and sufficiency of representation in available video collections. A set of 1873 videos from real users has been annotated with these concepts. Starting with a basic representation of each video clip as a sequence of mel-frequency cepstral coefficient (MFCC) frames, we experiment with three clip-level representations: single Gaussian modeling, Gaussian mixture modeling, and probabilistic latent semantic analysis of a Gaussian component histogram. Using such summary features, we produce support vector machine (SVM) classifiers based on the Kullback-Leibler, Bhattacharyya, or Mahalanobis distance measures. Quantitative evaluation shows that our approaches are effective for detecting interesting concepts in a large collection of real-world consumer video clips.
引用
收藏
页码:1406 / 1416
页数:11
相关论文
共 50 条
  • [1] Robust Audio-based Classification of Video Genre
    Rouvier, Mickael
    Linares, Georges
    Matrouf, Driss
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1155 - 1158
  • [2] Factor Analysis for Audio-based Video Genre Classification
    Rouvier, Mickael
    Matrouf, Driss
    Linares, Georges
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1131 - 1134
  • [3] Audio-Based Video Genre Identification
    Rouvier, Mickael
    Oger, Stanislas
    Linares, Georges
    Matrouf, Driss
    Merialdo, Bernard
    Li, Yingbo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (06) : 1031 - 1041
  • [4] AUDIO-BASED NONLINEAR VIDEO DIFFUSION
    Casanovas, Anna Llagostera
    Vandergheynst, Pierre
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 2486 - 2489
  • [5] AUDIO-BASED CLASSIFICATION OF SPEAKER CHARACTERISTICS
    Dutta, Promiti
    Haubold, Alexander
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 422 - 425
  • [6] Audio-based event detection for sports video
    Baillie, M
    Jose, JM
    IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, 2003, 2728 : 300 - 309
  • [7] Combining audio-based and video-based shot classification systems for news videos segmentation
    De Santo, M
    Percannella, G
    Sansone, C
    Vento, M
    MULTIPLE CLASSIFIER SYSTEMS, 2005, 3541 : 397 - 406
  • [8] A Survey of Audio-Based Music Classification and Annotation
    Fu, Zhouyu
    Lu, Guojun
    Ting, Kai Ming
    Zhang, Dengsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2011, 13 (02) : 303 - 319
  • [9] Hierarchical structure for audio-video based semantic classification of sports video sequences
    Kolekar, MH
    Sengupta, S
    VISUAL COMMUNICATIONS AND IMAGE PROCESSING 2005, PTS 1-4, 2005, 5960 : 401 - 409
  • [10] Audio-Based Music Classification with DenseNet and Data Augmentation
    Bian, Wenhao
    Wang, Jie
    Zhuang, Bojin
    Yang, Jiankui
    Wang, Shaojun
    Xiao, Jing
    PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT III, 2019, 11672 : 56 - 65