A comparison of features for speech, music discrimination.

被引:79
|
作者
Carey, MJ [1 ]
Parris, ES [1 ]
Lloyd-Thomas, H [1 ]
机构
[1] Ensigma Ltd, Chepstow, Mons, England
关键词
D O I
10.1109/ICASSP.1999.758084
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Several approaches have previously been taken to the problem of discriminating between speech and music signals. These have used different features as the input to the classifier and have tested and trained on different material. In this paper we examine the discrimination achieved by several different features using common training and test sets and the same classifier. The database assembled for these tests includes speech from thirteen languages and music from all over the world. In each case the distributions in the feature space were modelled by a Gaussian mixture model. Experiments were carried out on four types of feature, amplitude, cepstra, pitch and zero-crossings. In each case the derivative of the feature was also used and found to improve performance. The best performance resulted from using the cepstra and delta cepstra which gave an equal error rate (EER) of 1.2%. This was closely followed by normalised amplitude and delta amplitude. This however used a much less complex model. The pitch and delta pitch gave an EER of 4% which was better than the zero-crossing which produced an EER of 6%.
引用
收藏
页码:149 / 152
页数:4
相关论文
共 50 条
  • [41] Opposite adaptation improves direction discrimination.
    Clifford, C
    Arnold, DH
    Wyatt, AM
    Wenderoth, P
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2001, 42 (04) : S532 - S532
  • [42] Motion pooling affects speed discrimination.
    Vreven, D
    Verghese, P
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2000, 41 (04) : S795 - S795
  • [43] Chiral fluids: Simulation and theory of discrimination.
    Cann, NN
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2003, 225 : U484 - U484
  • [44] Research on the shape of fields of tactile discrimination.
    Pieron, H
    COMPTES RENDUS DES SEANCES DE LA SOCIETE DE BIOLOGIE ET DE SES FILIALES, 1914, 76 : 82 - 83
  • [45] APPLICATION OF A BAYESIAN METHOD IN MODEL DISCRIMINATION.
    Gupta, Yash P.
    Akpan, Imeh
    Modeling and Simulation, Proceedings of the Annual Pittsburgh Conference, 1600,
  • [46] Integration characteristics for temporal frequency discrimination.
    Welch, I
    Middleton, JA
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2000, 41 (04) : S233 - S233
  • [47] Molecular imprinted array for carbohydrate discrimination.
    Greene, NT
    Lee, JD
    Shimizu, KD
    Hong, JI
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2005, 229 : U151 - U152
  • [48] Speech/music classification using speech-specific features
    Khonglah, Banriskhem K.
    Prasanna, S. R. Mahadeva
    DIGITAL SIGNAL PROCESSING, 2016, 48 : 71 - 83
  • [49] Speech and Music Discrimination Using Spectral Transition Rate
    Yang, Kyong-Chul
    Bang, Yong-Chan
    Cho, Sun-Ho
    Yook, Dongsuk
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2009, 28 (03): : 273 - 278
  • [50] Speech/music discrimination based on wavelets for broadcast programs
    Didiot, E.
    Illina, I.
    Mella, O.
    Fohr, D.
    Haton, J. -P
    SIGMAP 2006: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MULTIMEDIA APPLICATIONS, 2006, : 151 - +