Significance of neural phonotactic models for large-scale spoken language identification

被引:0
|
作者
Srivastava, Brij Mohan Lal [1 ]
Vydana, Hari [1 ]
Vuppala, Anil Kumar [1 ]
Shrivastava, Manish [1 ]
机构
[1] Int Inst Informat Technol, Language Technol Res Ctr, Hyderabad, Andhra Pradesh, India
来源
2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2017年
关键词
RECOGNITION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Language identification (LID) is vital frontend for spoken dialogue systems operating in diverse linguistic settings to reduce recognition and understanding errors. Existing LID systems which use low-level signal information for classification do not scale well due to exponential growth of parameters as the classes increase. They also suffer performance degradation due to the inherent variabilities of speech signal. In the proposed approach, we model the language-specific phonotactic information in speech using recurrent neural network for developing an LID system. The input speech signal is tokenized to phone sequences by using a common language-independent phone recognizer with varying phonetic coverage. We establish a causal relationship between phonetic coverage and LID performance. The phonotactics in the observed phone sequences are modeled using statistical and recurrent neural network language models to predict language-specific symbol from a universal phonetic inventory. Proposed approach is robust, computationally light weight and highly scalable. Experiments show that the convex combination of statistical and recurrent neural network language model (RNNLM) based phonotactic models significantly outperform a strong baseline system of Deep Neural Network (DNN) which is shown to surpass the performance of i-vector based approach for LID. The proposed approach outperforms the baseline models in terms of mean F1 score over 176 languages. Further we provide significant information-theoretic evidence to analyze the mechanism of the proposed approach.
引用
收藏
页码:2144 / 2151
页数:8
相关论文
共 50 条
  • [21] Large-Scale and Language-Oblivious Code Authorship Identification
    Abuhamad, Mohammed
    AbuHmed, Tamer
    Mohaisen, Aziz
    Nyang, DaeHun
    PROCEEDINGS OF THE 2018 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (CCS'18), 2018, : 101 - 114
  • [22] Spoken Language Identification Based on GMM Models
    Dustor, Adam
    Szwarc, Pawel
    INTERNATIONAL CONFERENCE ON SIGNALS AND ELECTRONIC SYSTEMS (ICSES '10): CONFERENCE PROCEEDINGS, 2010, : 105 - 108
  • [23] Limits of Detecting Text Generated by Large-Scale Language Models
    Varshney, Lav R.
    Keskar, Nitish Shirish
    Socher, Richard
    2020 INFORMATION THEORY AND APPLICATIONS WORKSHOP (ITA), 2020,
  • [24] On the Multilingual Capabilities of Very Large-Scale English Language Models
    Armengol-Estape, Jordi
    de Gibert Bonet, Ona
    Melero, Maite
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3056 - 3068
  • [25] Large-Scale Random Forest Language Models for Speech Recognition
    Su, Yi
    Jelinek, Frederick
    Khudanpur, Sanjeev
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 945 - 948
  • [26] Large-Scale Language Models for Sarcasm Detection with Data Augmentation
    Zhang, Linrui
    Copus, Belinda
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT II, NLDB 2024, 2024, 14763 : 1 - 9
  • [27] Towards Artwork Explanation in Large-scale Vision Language Models
    Hayashi, Kazuki
    Sakai, Yusuke
    Kamigaito, Hidetaka
    Hayashi, Katsuhiko
    Watanabe, Taro
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 705 - 729
  • [28] Neural network classifiers for language identification using phonotactic and prosodic features
    Mary, L
    Rao, KS
    Yegnanarayana, B
    2005 INTERNATIONAL CONFERENCE ON INTELLIGENT SENSING AND INFORMATION PROCESSING, PROCEEDINGS, 2005, : 404 - 408
  • [29] An Investigation of Applying Large Language Models to Spoken Language Learning
    Gao, Yingming
    Nuchged, Baorian
    Li, Ya
    Peng, Linkai
    APPLIED SCIENCES-BASEL, 2024, 14 (01):
  • [30] MedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models
    Cai, Yan
    Wang, Linlin
    Wang, Ye
    de Melo, Gerard
    Zhang, Ya
    Wang, Yanfeng
    He, Liang
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17709 - 17717