Speech/Music Discrimination in Audio Podcast Using Structural Segmentation and Timbre Recognition

被引:0
|
作者
Barthet, Mathieu [1 ]
Hargreaves, Steven [1 ]
Sandler, Mark [1 ]
机构
[1] Queen Mary Univ London, Ctr Digital Mus, London E1 4NS, England
来源
EXPLORING MUSIC CONTENTS | 2011年 / 6684卷
基金
英国工程与自然科学研究理事会;
关键词
Speech/Music Discrimination; Audio Podcast; Timbre Recognition; Structural Segmentation; Line Spectral Frequencies; K-means clustering; Mel-Frequency Cepstral Coefficients; Hidden Markov Models; DIMENSIONS; PREDICTOR;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose two speech/music discrimination methods using timbre models and measure their performances on a 3 hour long database of radio podcasts from the BBC. In the first method, the machine estimated classifications obtained with an automatic timbre recognition (ATR) model are post-processed using median filtering. The classification system (LSF/K-means) was trained using two different taxonomic levels, a high-level one (speech, music), and a lower-level one (male and female speech, classical, jazz, rock pop). The second method combines automatic structural segmentation and timbre recognition (ASS/ATR). The ASS evaluates the similarity between feature distributions (MFCC, RMS) using HMM and soft K-means algorithms. Both methods were evaluated at a semantic (relative correct overlap RCO), and temporal (boundary retrieval F-measure) levels. The ASS/ATR method obtained the best results (average RCO of 94.5% and boundary F-measure of 50.1%). These performances were favourably compared with that obtained by a SVM-based technique providing a good benchmark of the state of the art.
引用
收藏
页码:138 / 162
页数:25
相关论文
共 50 条
  • [1] A dynamic programming approach to audio segmentation and speech/music discrimination
    Goodwin, MM
    Laroche, J
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PROCEEDINGS: AUDIO AND ELECTROACOUSTICS SIGNAL PROCESSING FOR COMMUNICATIONS, 2004, : 309 - 312
  • [2] AUDIO SEGMENTATION FOR SPEECH RECOGNITION USING SEGMENT FEATURES
    Rybach, David
    Gollan, Christian
    Schlueter, Ralf
    Ney, Hermann
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4197 - 4200
  • [3] Audio coding improvement using evolutionary speech/music discrimination
    Exposito, J. E. Munoz
    Galan, S. Garcia
    Reyes, N. Ruiz
    Candeas, R. Vera
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-4, 2007, : 822 - 827
  • [4] Speech/music discrimination for robust speech recognition in robots
    Choi, Mu Yeol
    Song, Hwa Jeon
    Kim, Hyung Soon
    [J]. 2007 RO-MAN: 16TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, VOLS 1-3, 2007, : 118 - +
  • [5] Speech-Music Segmentation System for Speech Recognition
    Demir, Cemil
    Dogan, Mehmet Ugur
    [J]. 2009 IEEE 17TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2009, : 846 - 849
  • [6] Speech/music discrimination-based audio characterization using blind watermarking scheme
    Mezghani, Eya
    Charfeddine, Maha
    Nicolas, Henri
    Ben Amar, Chokri
    [J]. JOURNAL OF INFORMATION ASSURANCE AND SECURITY, 2016, 11 (06): : 311 - 321
  • [7] Efficient Advertisement Discovery for Audio Podcast Content Using Candidate Segmentation
    MN Nguyen
    Qi Tian
    Ping Xue
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2010
  • [8] Multiple scale music segmentation using rhythm, timbre, and harmony
    Jensen, Kristoffer
    [J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2007, 2007 (1)
  • [9] Multiple Scale Music Segmentation Using Rhythm, Timbre, and Harmony
    Kristoffer Jensen
    [J]. EURASIP Journal on Advances in Signal Processing, 2007
  • [10] Efficient Advertisement Discovery for Audio Podcast Content Using Candidate Segmentation
    Nguyen, M. N.
    Tian, Qi
    Xue, Ping
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2010,