Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks

被引:85
|
作者
Zazo, Ruben [1 ]
Lozano-Diez, Alicia [1 ]
Gonzalez-Dominguez, Javier [1 ]
Toledano, Doroteo T. [1 ]
Gonzalez-Rodriguez, Joaquin [1 ]
机构
[1] Univ Autonoma Madrid, ATVS Biometr Recognit Grp, Madrid, Spain
来源
PLOS ONE | 2016年 / 11卷 / 01期
关键词
SPEAKER;
D O I
10.1371/journal.pone.0146917
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Long Short Term Memory (LSTM) Recurrent Neural Networks (RNNs) have recently outperformed other state-of-the-art approaches, such as i-vector and Deep Neural Networks (DNNs), in automatic Language Identification (LID), particularly when dealing with very short utterances (similar to 3s). In this contribution we present an open-source, end-to-end, LSTM RNN system running on limited computational resources (a single GPU) that outperforms a reference i-vector system on a subset of the NIST Language Recognition Evaluation (8 target languages, 3s task) by up to a 26%. This result is in line with previously published research using proprietary LSTM implementations and huge computational resources, which made these former results hardly reproducible. Further, we extend those previous experiments modeling unseen languages (out of set, OOS, modeling), which is crucial in real applications. Results show that a LSTM RNN with OOS modeling is able to detect these languages and generalizes robustly to unseen OOS languages. Finally, we also analyze the effect of even more limited test data (from 2.25s to 0.1s) proving that with as little as 0.5s an accuracy of over 50% can be achieved.
引用
下载
收藏
页数:17
相关论文
共 50 条
  • [21] Long Short-Term Memory (LSTM) Deep Neural Networks in Energy Appliances Prediction
    Kouziokas, Georgios N.
    2019 PANHELLENIC CONFERENCE ON ELECTRONICS AND TELECOMMUNICATIONS (PACET2019), 2019, : 162 - 166
  • [22] Multilayer Long Short-Term Memory (LSTM) Neural Networks in Time Series Analysis
    Malinovic, Nemanja S.
    Predic, Bratislav B.
    Roganovic, Milos
    2020 55TH INTERNATIONAL SCIENTIFIC CONFERENCE ON INFORMATION, COMMUNICATION AND ENERGY SYSTEMS AND TECHNOLOGIES (IEEE ICEST 2020), 2020, : 11 - 14
  • [23] Detecting Overlapping Speech with Long Short-Term Memory Recurrent Neural Networks
    Geiger, Juergen T.
    Eyben, Florian
    Schuller, Bjoern
    Rigoll, Gerhard
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1667 - 1671
  • [24] Long Short-Term Memory Based Recurrent Neural Networks for Collaborative Filtering
    Zou, Lixin
    Gu, Yulong
    Song, Jiaxing
    Liu, Weidong
    Yao, Yuan
    2017 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTED, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2017,
  • [25] Long and Short-Term Recommendations with Recurrent Neural Networks
    Devooght, Robin
    Bersini, Hugues
    PROCEEDINGS OF THE 25TH CONFERENCE ON USER MODELING, ADAPTATION AND PERSONALIZATION (UMAP'17), 2017, : 13 - 21
  • [26] Recognition of Sign Language System for Indonesian Language Using Long Short-Term Memory Neural Networks
    Rakun, Erdefi
    Arymurthy, Aniati M.
    Stefanus, Lim Y.
    Wicaksono, Alfan F.
    Wisesa, I. Wayan W.
    ADVANCED SCIENCE LETTERS, 2018, 24 (02) : 999 - 1004
  • [27] Using Long Short-Term Memory (LSTM) Neural Networks to Predict Emergency Department Wait Time
    Cheng, Nok
    Kuo, Alex
    IMPORTANCE OF HEALTH INFORMATICS IN PUBLIC HEALTH DURING A PANDEMIC, 2020, 272 : 199 - 202
  • [28] Using Long Short-Term Memory (LSTM) Neural Networks to Predict Emergency Department Wait Time
    Cheng, Nok
    Kuo, Alex
    DIGITAL PERSONALIZED HEALTH AND MEDICINE, 2020, 270 : 1425 - 1426
  • [29] Using Ant Colony Optimization to Optimize Long Short-Term Memory Recurrent Neural Networks
    ElSaid, AbdElRahman
    El Jamiy, Fatima
    Higgins, James
    Wild, Brandon
    Desell, Travis
    GECCO'18: PROCEEDINGS OF THE 2018 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2018, : 13 - 20
  • [30] GRAPHEME-TO-PHONEME CONVERSION USING LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS
    Rao, Kanishka
    Peng, Fuchun
    Sak, Hasim
    Beaufays, Francoise
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4225 - 4229