Combined Bidirectional Long Short-Term Memory with Mel-Frequency Cepstral Coefficients Using Autoencoder for Speaker Recognition

被引:2
|
作者
Chen, Young-Long [1 ]
Wang, Neng-Chung [2 ]
Ciou, Jing-Fong [1 ]
Lin, Rui-Qi [1 ]
机构
[1] Natl Taichung Univ Sci & Technol, Dept Comp Sci & Informat Engn, Taichung 404336, Taiwan
[2] Natl United Univ, Dept Comp Sci & Informat Engn, Miaoli 360302, Taiwan
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 12期
关键词
speaker recognition; neural network; long short-term memory; mel-frequency cepstral coefficients; triplet loss; IDENTIFICATION; CLASSIFICATION;
D O I
10.3390/app13127008
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Recently, neural network technology has shown remarkable progress in speech recognition, including word classification, emotion recognition, and identity recognition. This paper introduces three novel speaker recognition methods to improve accuracy. The first method, called long short-term memory with mel-frequency cepstral coefficients for triplet loss (LSTM-MFCC-TL), utilizes MFCC as input features for the LSTM model and incorporates triplet loss and cluster training for effective training. The second method, bidirectional long short-term memory with mel-frequency cepstral coefficients for triplet loss (BLSTM-MFCC-TL), enhances speaker recognition accuracy by employing a bidirectional LSTM model. The third method, bidirectional long short-term memory with mel-frequency cepstral coefficients and autoencoder features for triplet loss (BLSTM-MFCCAE-TL), utilizes an autoencoder to extract additional AE features, which are then concatenated with MFCC and fed into the BLSTM model. The results showed that the performance of the BLSTM model was superior to the LSTM model, and the method of adding AE features achieved the best learning effect. Moreover, the proposed methods exhibit faster computation times compared to the reference GMM-HMM model. Therefore, utilizing pre-trained autoencoders for speaker encoding and obtaining AE features can significantly enhance the learning performance of speaker recognition. Additionally, it also offers faster computation time compared to traditional methods.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] PPG-based human identification using Mel-frequency cepstral coefficients and neural networks
    Ali I. Siam
    Atef Abou Elazm
    Nirmeen A. El-Bahnasawy
    Ghada M. El Banby
    Fathi E. Abd El-Samie
    [J]. Multimedia Tools and Applications, 2021, 80 : 26001 - 26019
  • [42] Indirect health monitoring of bridges using Mel-frequency cepstral coefficients and principal component analysis
    Mei, Qipei
    Gul, Mustafa
    Boay, Marcus
    [J]. MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2019, 119 : 523 - 546
  • [43] PPG-based human identification using Mel-frequency cepstral coefficients and neural networks
    Siam, Ali I.
    Elazm, Atef Abou
    El-Bahnasawy, Nirmeen A.
    El Banby, Ghada M.
    Abd El-Samie, Fathi E.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (17) : 26001 - 26019
  • [44] Clean speech reconstruction from noisy MEL-frequency cepstral coefficients using a sinusoidal model
    Shao, X
    Milner, B
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 704 - 707
  • [45] Speaker Recognition Using Mel Frequency Cepstral Coefficient and Locality Sensitive Hashing
    Awais, Ahmed
    Kun, She
    Yu, Yue
    Hayat, Shaukat
    Ahmed, Aftab
    Tu, Tianyi
    [J]. 2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD), 2018, : 271 - 276
  • [46] Speech Based Arithmetic Calculator Using Mel-Frequency Cepstral Coefficients and Gaussian Mixture Models
    Husain, Moula
    Meena, S. M.
    Gonal, Manjunath K.
    [J]. PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING, NETWORKING AND INFORMATICS (ICACNI 2015), VOL 1, 2016, 43 : 209 - 218
  • [47] Do long-term acoustic-phonetic features and mel-frequency cepstral coefficients provide complementary speaker-specific information for forensic voice comparison?
    Chan, Ricky K. W.
    Wang, Bruce X.
    [J]. FORENSIC SCIENCE INTERNATIONAL, 2024, 363
  • [48] Fault Detection and Diagnosis Using Combined Autoencoder and Long Short-Term Memory Network
    Park, Pangun
    Di Marco, Piergiuseppe
    Shin, Hyejeon
    Bang, Junseong
    [J]. SENSORS, 2019, 19 (21)
  • [49] Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks
    Yin, Ruiqing
    Bredin, Herve
    Barras, Claude
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3827 - 3831
  • [50] Combining Short-term Cepstral and Long-term Pitch Features for Automatic Recognition of Speaker Age
    Mueller, Christian
    Burkhardt, Felix
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2268 - +