Scale-invariant MFCCs for speech/speaker recognition

被引:1
|
作者
Tufekci, Zekeriya [1 ]
Disken, Gokay [2 ]
机构
[1] Cukurova Univ, Fac Engn, Dept Comp Engn, Adana, Turkey
[2] Adana Sci & Technol Univ, Fac Engn, Dept Elect & Elect Engn, Adana, Turkey
关键词
Feature extraction; speaker recognition; speech recognition; SPEECH;
D O I
10.3906/elk-1901-231
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The feature extraction process is a fundamental part of speech processing. Mel frequency cepstral coefficients (MFCCs) are the most commonly used feature types in the speech/speaker recognition literature. However, the MFCC framework may face numerical issues or dynamic range problems, which decreases their performance. A practical solution to these problems is adding a constant to filter-bank magnitudes before log compression, thus violating the scale-invariant property. In this work, a magnitude normalization and a multiplication constant are introduced to make the MFCCs scale-invariant and to avoid dynamic range expansion of nonspeech frames. Speaker verification experiments are conducted to show the effectiveness of the proposed scheme.
引用
收藏
页码:3758 / 3762
页数:5
相关论文
共 50 条
  • [1] Channel Robust MFCCs for Continuous Speech Speaker Recognition
    Chougule, Sharada Vikram
    Chavan, Mahesh S.
    ADVANCES IN SIGNAL PROCESSING AND INTELLIGENT RECOGNITION SYSTEMS, 2014, 264 : 557 - 568
  • [2] Translation, rotation, and scale-invariant object recognition
    Torres-Méndez, LA
    Ruiz-Suárez, JC
    Sucar, LE
    Gómez, G
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2000, 30 (01): : 125 - 130
  • [3] Speaker-Invariant Features for Automatic Speech Recognition
    Umesh, S.
    Sanand, D. R.
    Praveen, G.
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1738 - 1743
  • [4] Object class recognition by unsupervised scale-invariant learning
    Fergus, R
    Perona, P
    Zisserman, A
    2003 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL II, PROCEEDINGS, 2003, : 264 - 271
  • [5] Selection of scale-invariant parts for object class recognition
    Dorkó, G
    Schmid, C
    NINTH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS I AND II, PROCEEDINGS, 2003, : 634 - 640
  • [6] Scale-invariant shape features for recognition of object categories
    Jurie, F
    Schmid, C
    PROCEEDINGS OF THE 2004 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 2, 2004, : 90 - 96
  • [7] An Application of Scale-invariant Feature Transform in Iris Recognition
    Zhao, Weijie
    Chen, Xiaodong
    Cheng, Ji
    Jiang, Linhua
    2013 IEEE/ACIS 12TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2013, : 219 - 222
  • [8] KEYBOARD RECOGNITION FROM SCALE-INVARIANT FEATURE TRANSFORM
    Chao, Ming-Te
    Chen, Yung-Sheng
    2017 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TW), 2017,
  • [9] Speaker Independent Automatic Emotion Recognition from Speech: A Comparison of MFCCs and Discrete Wavelet Transforms
    Shah, Firoz A.
    Krishnan, Vimal V. R.
    Sukumar, Raji A.
    Jayakumar, Athulya
    Anto, Babu P.
    2009 INTERNATIONAL CONFERENCE ON ADVANCES IN RECENT TECHNOLOGIES IN COMMUNICATION AND COMPUTING (ARTCOM 2009), 2009, : 528 - 531
  • [10] Speech Emotion Recognition Model with Time-Scale-Invariance MFCCs as Input
    Xie, Xiaohan
    Lou, Jiaqi
    Zhang, Lingzhi
    IWCMC 2021: 2021 17TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE (IWCMC), 2021, : 537 - 542