Significance of neural phonotactic models for large-scale spoken language identification

被引:0
|
作者
Srivastava, Brij Mohan Lal [1 ]
Vydana, Hari [1 ]
Vuppala, Anil Kumar [1 ]
Shrivastava, Manish [1 ]
机构
[1] Int Inst Informat Technol, Language Technol Res Ctr, Hyderabad, Andhra Pradesh, India
关键词
RECOGNITION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Language identification (LID) is vital frontend for spoken dialogue systems operating in diverse linguistic settings to reduce recognition and understanding errors. Existing LID systems which use low-level signal information for classification do not scale well due to exponential growth of parameters as the classes increase. They also suffer performance degradation due to the inherent variabilities of speech signal. In the proposed approach, we model the language-specific phonotactic information in speech using recurrent neural network for developing an LID system. The input speech signal is tokenized to phone sequences by using a common language-independent phone recognizer with varying phonetic coverage. We establish a causal relationship between phonetic coverage and LID performance. The phonotactics in the observed phone sequences are modeled using statistical and recurrent neural network language models to predict language-specific symbol from a universal phonetic inventory. Proposed approach is robust, computationally light weight and highly scalable. Experiments show that the convex combination of statistical and recurrent neural network language model (RNNLM) based phonotactic models significantly outperform a strong baseline system of Deep Neural Network (DNN) which is shown to surpass the performance of i-vector based approach for LID. The proposed approach outperforms the baseline models in terms of mean F1 score over 176 languages. Further we provide significant information-theoretic evidence to analyze the mechanism of the proposed approach.
引用
收藏
页码:2144 / 2151
页数:8
相关论文
共 50 条
  • [31] PHONOTACTIC SPOKEN LANGUAGE RECOGNITION: USING DIVERSELY ADAPTED ACOUSTIC MODELS IN PARALLEL PHONE RECOGNIZERS
    Leung, Cheung-Chi
    Ma, Bin
    Li, Haizhou
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 108 - 111
  • [32] Generating complex connectivity structures for large-scale neural models
    Hulse, Martin
    ARTIFICIAL NEURAL NETWORKS - ICANN 2008, PT II, 2008, 5164 : 849 - 858
  • [33] Cross-lingual Transfer Learning with Data Selection for Large-Scale Spoken Language Understanding
    Quynh Do
    Gaspers, Judith
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1455 - 1460
  • [34] LARGE-SCALE UNSUPERVISED PRE-TRAINING FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Wang, Pengwei
    Wei, Liangchen
    Cao, Yong
    Xie, Jinghui
    Nie, Zaiqing
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7999 - 8003
  • [35] LARGE-SCALE 3-DIMENSIONAL SEISMIC MODELS AND THEIR INTERPRETIVE SIGNIFICANCE
    MUFTI, IR
    GEOPHYSICS, 1990, 55 (09) : 1166 - 1182
  • [36] Deep Context: A Neural Language Model for Large-scale Networked Documents
    Wu, Hao
    Lerman, Kristina
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3091 - 3097
  • [37] Extending boosting for large scale spoken language understanding
    Gokhan Tur
    Machine Learning, 2007, 69 : 55 - 74
  • [38] Extending boosting for large scale spoken language understanding
    Tur, Gokhan
    MACHINE LEARNING, 2007, 69 (01) : 55 - 74
  • [39] Training large-scale language models with limited GPU memory: a survey
    Yu TANG
    Linbo QIAO
    Lujia YIN
    Peng LIANG
    Ao SHEN
    Zhilin YANG
    Lizhi ZHANG
    Dongsheng LI
    Frontiers of Information Technology & Electronic Engineering, 2025, 26 (03) : 309 - 331
  • [40] Regularized Continual Learning for Large-Scale Language Models via Probing
    Song, Xingshen
    Ren, Tianxiang
    Deng, Jinsheng
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 29 - 41