Significance of neural phonotactic models for large-scale spoken language identification

被引:0
|
作者
Srivastava, Brij Mohan Lal [1 ]
Vydana, Hari [1 ]
Vuppala, Anil Kumar [1 ]
Shrivastava, Manish [1 ]
机构
[1] Int Inst Informat Technol, Language Technol Res Ctr, Hyderabad, Andhra Pradesh, India
关键词
RECOGNITION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Language identification (LID) is vital frontend for spoken dialogue systems operating in diverse linguistic settings to reduce recognition and understanding errors. Existing LID systems which use low-level signal information for classification do not scale well due to exponential growth of parameters as the classes increase. They also suffer performance degradation due to the inherent variabilities of speech signal. In the proposed approach, we model the language-specific phonotactic information in speech using recurrent neural network for developing an LID system. The input speech signal is tokenized to phone sequences by using a common language-independent phone recognizer with varying phonetic coverage. We establish a causal relationship between phonetic coverage and LID performance. The phonotactics in the observed phone sequences are modeled using statistical and recurrent neural network language models to predict language-specific symbol from a universal phonetic inventory. Proposed approach is robust, computationally light weight and highly scalable. Experiments show that the convex combination of statistical and recurrent neural network language model (RNNLM) based phonotactic models significantly outperform a strong baseline system of Deep Neural Network (DNN) which is shown to surpass the performance of i-vector based approach for LID. The proposed approach outperforms the baseline models in terms of mean F1 score over 176 languages. Further we provide significant information-theoretic evidence to analyze the mechanism of the proposed approach.
引用
收藏
页码:2144 / 2151
页数:8
相关论文
共 50 条
  • [1] Fusion of Contrastive Acoustic Models for Parallel Phonotactic Spoken Language Identification
    Sim, Khe Chai
    Li, Haizhou
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 541 - 544
  • [2] Phonotactic spoken language identification with limited training data
    Peche, Marius
    Davel, Marelie
    Barnard, Etienne
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1661 - 1664
  • [3] Integrating acoustic, prosodic and phonotactic features for spoken language identification
    Tong, Rong
    Ma, Bin
    Zhu, Donglai
    Li, Haizhou
    Chng, Eng Siong
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 205 - 208
  • [4] Text- and speech-based phonotactic models for spoken language identification of Basque and Spanish
    Guijarrubia, Victor G.
    Ines Torres, M.
    PATTERN RECOGNITION LETTERS, 2010, 31 (06) : 523 - 532
  • [5] Natural Language Processing in Large-Scale Neural Models for Medical Screenings
    Stille, Catharina Marie
    Bekolay, Trevor
    Blouw, Peter
    Kroeger, Bernd J.
    FRONTIERS IN ROBOTICS AND AI, 2019, 6
  • [6] LARGE-SCALE WORD REPRESENTATION FEATURES FOR IMPROVED SPOKEN LANGUAGE UNDERSTANDING
    Zhang, Jun
    Yang, Terry Zhenrong
    Hazen, Timothy J.
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5306 - 5310
  • [7] Identification of parameters for large-scale kinetic models
    Abdulla, Ugur G.
    Poteau, Roby
    JOURNAL OF COMPUTATIONAL PHYSICS, 2021, 429
  • [8] Improved phonotactic language identification using random forest language models
    Wang, XiaoRui
    Wang, ShiJin
    Liang, JiaEn
    Xu, Bo
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4237 - 4240
  • [9] Large-scale Transfer Learning for Low-resource Spoken Language Understanding
    Jia, Xueli
    Wang, Jianzong
    Zhang, Zhiyong
    Cheng, Ning
    Xiao, Jing
    INTERSPEECH 2020, 2020, : 1555 - 1559
  • [10] DECIDING WHETHER TO ASK CLARIFYING QUESTIONS IN LARGE-SCALE SPOKEN LANGUAGE UNDERSTANDING
    Kim, Joo-Kyung
    Wang, Guoyin
    Lee, Sungjin
    Kim, Young-Bum
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 869 - 876