Gammatone Cepstral Coefficients: Biologically Inspired Features for Non-Speech Audio Classification

被引:190
|
作者
Valero, Xavier [1 ]
Alias, Francesc [1 ]
机构
[1] La Salle Univ Ramon Llull, GTM Grp Recerca Tecnol Media, Barcelona 08022, Spain
关键词
Audio classification; audio scene recognition; environmental sound; feature extraction; Gammatone cepstral coefficients; ENVIRONMENTAL SOUND RECOGNITION; FREQUENCY;
D O I
10.1109/TMM.2012.2199972
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the context of non-speech audio recognition and classification for multimedia applications, it becomes essential to have a set of features able to accurately represent and discriminate among audio signals. Mel frequency cepstral coefficients (MFCC) have become a de facto standard for audio parameterization. Taking as a basis the MFCC computation scheme, the Gammatone cepstral coefficients (GTCCs) are a biologically inspired modification employing Gammatone filters with equivalent rectangular bandwidth bands. In this letter, the GTCCs, which have been previously employed in the field of speech research, are adapted for non-speech audio classification purposes. Their performance is evaluated on two audio corpora of 4 h each (general sounds and audio scenes), following two cross-validation schemes and four machine learning methods. According to the results, classification accuracies are significantly higher when employing GTCC rather than other state-of-the-art audio features. As a detailed analysis shows, with a similar computational cost, the GTCC are more effective than MFCC in representing the spectral characteristics of non-speech audio signals, especially at low frequencies.
引用
收藏
页码:1684 / 1689
页数:6
相关论文
共 50 条
  • [1] Speech Emotion Recognition Using Gammatone Cepstral Coefficients and Deep Learning Features
    Sharan, Roneel, V
    2023 IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLIED NETWORK TECHNOLOGIES, ICMLANT, 2023, : 139 - 142
  • [2] Gammatone Wavelet Cepstral Coefficients for Robust Speech Recognition
    Adiga, Aniruddha
    Magimai-Doss, Mathew
    Seelamantula, Chandra Sekhar
    2013 IEEE INTERNATIONAL CONFERENCE OF IEEE REGION 10 (TENCON), 2013,
  • [3] Bottleneck Features based on Gammatone Frequency Cepstral Coefficients
    Qi, Jun
    Wang, Dong
    Xu, Ji
    Tejedor, Javier
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1750 - 1754
  • [4] Whispered speech recognition based on gammatone filterbank cepstral coefficients
    B. Marković
    J. Galić
    Ð. Grozdić
    S. T. Jovičić
    M. Mijić
    Journal of Communications Technology and Electronics, 2017, 62 : 1255 - 1261
  • [5] Whispered Speech Recognition Based on Gammatone Filterbank Cepstral Coefficients
    Markovic, B.
    Galic, J.
    Grozdic, D.
    Jovicic, S. T.
    Mijic, M.
    JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS, 2017, 62 (11) : 1255 - 1261
  • [6] Real-World Speech/Non-Speech Audio Classification Based on Sparse Representation Features and GPCs
    Shi, Ziqiang
    Han, Jiqing
    Zheng, Tieran
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2412 - 2415
  • [7] Speech Based Features Applied to the Detection of Non-speech Audio Events
    Vozarikova, Eva
    Cizmar, Anton
    12TH INTERNATIONAL CONFERENCE ON RESEARCH IN TELECOMMUNICATION TECHNOLOGIES (RTT 2010), 2010, : 125 - 128
  • [8] Call Analysis with Classification Using Speech and Non-Speech Features
    Ju, Yun-Cheng
    Wang, Ye-Yi
    Acero, Alex
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1902 - 1905
  • [9] Boosting speech/non-speech classification using averaged Mel-frequency Cepstrum Coefficients features
    Xiong, ZY
    Huang, TS
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2002, PROCEEDING, 2002, 2532 : 573 - 580
  • [10] NON-SPEECH AUDIO EVENT DETECTION
    Portelo, Jose
    Bugalho, Miguel
    Trancoso, Isabel
    Neto, Joao
    Abad, Alberto
    Serralheiro, Antonio
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 1973 - 1976