Modulation features for speech and music classification

被引:0
|
作者
Mubarak, Omer Mohsin [1 ]
Ambikairajah, Eliathamby [1 ]
Epps, Julien [2 ]
Gunawan, Teddy Surya [1 ]
机构
[1] Univ New South Wales, Sch Elect Engn & Telecommun, Sydney, NSW, Australia
[2] NICTA, Australian Techol Pk, Eveleigh 1430, Australia
关键词
D O I
暂无
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
Many attempts to accurately classify speech and music have been investigated over the years. This paper presents modulation features for effective speech and music classification. A Gammatone filter bank is used as a front-end for this classification system, where amplitude modulation (AM) and frequency modulation (FM) features are extracted from the critical band outputs of the Gammatone filters. In addition, the cepstral coefficients are also calculated from the energies of the filter bank outputs. The cepstral coefficients, AM and FM components are all given as input feature vector to Gaussian Mixture Models (GMM) which act as a speech-music classifier. The output probabilities of all GMMs are combined before making a decision. The error rate for different types of music has also been compared. Low frequency musical instruments such as the electric bass guitar were found to be more difficult to discriminate from speech, however, the proposed features were able to reduce such errors significantly.
引用
收藏
页码:764 / +
页数:3
相关论文
共 50 条
  • [1] Speech/music classification using speech-specific features
    Khonglah, Banriskhem K.
    Prasanna, S. R. Mahadeva
    [J]. DIGITAL SIGNAL PROCESSING, 2016, 48 : 71 - 83
  • [2] EMOTION CLASSIFICATION OF SPEECH USING MODULATION FEATURES
    Chaspari, Theodora
    Dimitriadis, Dimitrios
    Maragos, Petros
    [J]. 2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 1552 - 1556
  • [3] Speech/music classification using visual and spectral chromagram features
    Birajdar, Gajanan K.
    Patil, Mukesh D.
    [J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020, 11 (01) : 329 - 347
  • [4] Speech/music classification using visual and spectral chromagram features
    Gajanan K. Birajdar
    Mukesh D. Patil
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2020, 11 : 329 - 347
  • [5] Speech/Music Classification Using Features From Spectral Peaks
    Bhattacharjee, Mrinmoy
    Prasanna, S. R. Mahadeva
    Guha, Prithwijit
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1549 - 1559
  • [6] Automatic Music Mood Classification Based on Timbre and Modulation Features
    Ren, Jia-Min
    Wu, Ming-Ju
    Jang, Jyh-Shing Roger
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2015, 6 (03) : 236 - 246
  • [7] Music Classification Using the Bag of Words Model of Modulation Spectral Features
    Lee, Chang-Hsing
    Lin, Hwai-San
    Chen, Ling-Hwei
    [J]. 2015 15TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT), 2015, : 121 - 124
  • [8] Stacked auto-encoders based visual features for speech/music classification
    Kumar, Arvind
    Solanki, Sandeep Singh
    Chandra, Mahesh
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 208
  • [9] Speech/music segmentation using entropy and dynamism features in a HMM classification framework
    Ajmera, J
    McCowan, I
    Bourlard, H
    [J]. SPEECH COMMUNICATION, 2003, 40 (03) : 351 - 363
  • [10] MUSIC TONALITY FEATURES FOR SPEECH/MUSIC DISCRIMINATION
    Sell, Gregory
    Clark, Pascal
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,