CONTEXT-DEPENDENT MODELLING OF DEEP NEURAL NETWORK USING LOGISTIC REGRESSION

被引:0
|
作者
Wang, Guangsen [1 ]
Sim, Khe Chai [1 ]
机构
[1] Natl Univ Singapore, Dept Comp Sci, Sch Comp, Singapore 117548, Singapore
关键词
Context-Dependent Modelling; Deep Neural Network; Logistic Regression; Canonical State Modelling; Articulatory Features; RECOGNITION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The data sparsity problem of context-dependent acoustic modelling in automatic speech recognition is addressed by using the decision tree state clusters as the training targets in the standard context-dependent (CD) deep neural network (DNN) systems. As a result, the CD states within a cluster cannot be distinguished during decoding. This problem, referred to as the clustering problem, is not explicitly addressed in the current literature. In this paper, we formulate the CD DNN as an instance of the canonical state modelling technique based on a set of broad phone classes to address both the data sparsity and the clustering problems. The triphone is clustered into multiple sets of shorter biphones using broad phone contexts to address the data sparsity issue. A DNN is trained to discriminate the biphones within each set. The canonical states are represented by the concatenated log posteriors of all the broad phone DNNs. Logistic regression is used to transform the canonical states into the triphone state output probability. Clustering of the regression parameters is used to reduce model complexity while still achieving unique acoustic scores for all possible triphones. The experimental results on a broadcast news transcription task reveal that the proposed regression-based CD DNN significantly outperforms the standard CD DNN. The best system provides a 2.7% absolute WER reduction compared to the best standard CD DNN system.
引用
收藏
页码:338 / 343
页数:6
相关论文
共 50 条
  • [1] REFINEMENTS OF REGRESSION-BASED CONTEXT-DEPENDENT MODELLING OF DEEP NEURAL NETWORKS FOR AUTOMATIC SPEECH RECOGNITION
    Wang, Guangsen
    Sim, Khe Chai
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [2] STANDALONE TRAINING OF CONTEXT-DEPENDENT DEEP NEURAL NETWORK ACOUSTIC MODELS
    Zhang, C.
    Woodland, P. C.
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [3] Complementary tasks for context-dependent deep neural network acoustic models
    Bell, Peter
    Renals, Steve
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3610 - 3614
  • [4] Context-Dependent Translation Selection Using Convolutional Neural Network
    Hu, Baotian
    Tu, Zhaopeng
    Lu, Zhengdong
    Li, Hang
    Chen, Qingcai
    PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL) AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (IJCNLP), VOL 2, 2015, : 536 - 541
  • [5] Regression-Based Context-Dependent Modeling of Deep Neural Networks for Speech Recognition
    Wang, Guangsen
    Sim, Khe Chai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (11) : 1660 - 1669
  • [6] Multitask Learning of Context-Dependent Targets in Deep Neural Network Acoustic Models
    Bell, Peter
    Swietojanski, Pawel
    Renals, Steve
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (02) : 238 - 247
  • [7] Conversational Speech Transcription Using Context-Dependent Deep Neural Networks
    Seide, Frank
    Li, Gang
    Yu, Dong
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 444 - +
  • [8] Deep Neural Networks for Context-Dependent Deep Brain Stimulation
    Haddock, Andrew
    Chizeck, Howard J.
    Ko, Andrew L.
    2019 9TH INTERNATIONAL IEEE/EMBS CONFERENCE ON NEURAL ENGINEERING (NER), 2019, : 957 - 960
  • [9] Modelling and analysis of Salmonella Typhimurium infections using logistic regression and neural network models
    Qin, LX
    Yang, SX
    Dore, K
    Pollari, F
    PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS AND BRAIN, VOLS 1-3, 2005, : 1749 - 1754
  • [10] NEURAL NETWORK JOINT MODELING VIA CONTEXT-DEPENDENT PROJECTION
    Tam, Yik-Cheung
    Lei, Yun
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5356 - 5360