Learnable MFCCs for Speaker Verification

被引:5
|
作者
Liu, Xuechen [1 ,2 ]
Sahidullah, Md [2 ]
Kinnunen, Tomi [1 ]
机构
[1] Univ Eastern Finland, Sch Comp, Joensuu, Finland
[2] Univ Lorraine, CNRS, INRIA, LORIA, F-54000 Nancy, France
基金
芬兰科学院;
关键词
Speaker verification; feature extraction; mel-frequency cesptral coefficients (MFCCs); RECOGNITION; FEATURES;
D O I
10.1109/ISCAS51556.2021.9401593
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We propose a learnable mel-frequency cepstral coefficients (MFCCs) front-end architecture for deep neural network (DNN) based automatic speaker verification. Our architecture retains the simplicity and interpretability of MFCC-based features while allowing the model to be adapted to data flexibly. In practice, we formulate data-driven version of four linear transforms in a standard MFCC extractor - windowing, discrete Fourier transform (DFT), mel filterbank and discrete cosine transform (DCT). Results reported reach up to 6.7% (VoxCeleb1) and 9.7% (SITW) relative improvement in term of equal error rate (EER) from static MFCCs, without additional tuning effort.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Learnable Sparse Filterbank for Speaker Verification
    Peng, Junyi
    Gu, Rongzhi
    Mosner, Ladislav
    Plchot, Oldrich
    Burget, Lukas
    Cernocky, Jan
    INTERSPEECH 2022, 2022, : 5110 - 5114
  • [2] LEARNABLE NONLINEAR COMPRESSION FOR ROBUST SPEAKER VERIFICATION
    Liu, Xuechen
    Sahidullah, Md
    Kinnunen, Tomi
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7962 - 7966
  • [3] Advantages of Wideband over Narrowband Channels for Speaker Verification Employing MFCCs and LFCCs
    Gallardo, Laura Fernandez
    Wagner, Michael
    Moeller, Sebastian
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1115 - 1119
  • [4] LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification
    Chen, Xing
    Wang, Jie
    Zhang, Xiao-Lei
    Zhang, Wei-Qiang
    Yang, Kunde
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2476 - 2490
  • [5] On the effectiveness of MFCCs and their statistical distribution properties in speaker identification
    Molla, KI
    Hirose, K
    2004 IEEE SYMPOSIUM ON VIRTUAL ENVIRONMENTS, HUMAN-COMPUTRE INTERFACES AND MEASUREMENT SYSTEMS, 2004, : 136 - 141
  • [6] Scale-invariant MFCCs for speech/speaker recognition
    Tufekci, Zekeriya
    Disken, Gokay
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (05) : 3758 - 3762
  • [7] Channel Robust MFCCs for Continuous Speech Speaker Recognition
    Chougule, Sharada Vikram
    Chavan, Mahesh S.
    ADVANCES IN SIGNAL PROCESSING AND INTELLIGENT RECOGNITION SYSTEMS, 2014, 264 : 557 - 568
  • [8] Speaker recognition via fusion of subglottal features and MFCCs
    Arsikere, Harish
    Gupta, Hitesh Anand
    Alwan, Abeer
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1106 - 1110
  • [9] Automatic Speaker Recognition Dependency on Both the Shape of Auditory Critical Bands and Speaker Discriminative MFCCs
    Jokic, Ivan
    Delic, Vlado
    Jokic, Stevan
    Peric, Zoran
    ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2015, 15 (04) : 25 - 32
  • [10] SPEAKER VERIFICATION
    CHAPMAN, WD
    LI, KP
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1966, 40 (05): : 1282 - &