ON LATTICE-FREE BOOSTED MMI TRAINING OF HMM AND CTC-BASED FULL-CONTEXT ASR MODELS

被引:2
|
作者
Zhang, Xiaohui [1 ]
Manohar, Vimal [1 ]
Zhang, David [1 ]
Zhang, Frank [1 ]
Shi, Yangyang [1 ]
Singhal, Nayan [1 ]
Chan, Julian [1 ]
Peng, Fuchun [1 ]
Saraf, Yatharth [1 ]
Seltzer, Mike [1 ]
机构
[1] Facebook AI, Menlo Pk, CA 94025 USA
关键词
LF-MMI; CTC; HMM; modeling units; boost;
D O I
10.1109/ASRU51503.2021.9688056
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hybrid automatic speech recognition (ASR) models are typically sequentially trained with CTC or LF-MMI criteria. However, they have vastly different legacies and are usually implemented in different frameworks. In this paper, by decoupling the concepts of modeling units and label topologies and building proper numerator/denominator graphs accordingly, we establish a generalized framework for hybrid acoustic modeling (AM). In this framework, we show that LF-MMI is a powerful training criterion applicable to both limited-context and full-context models, for wordpiece/mono-char/bi-char/chenone units, with both HMM/CTC topologies. From this framework, we propose three novel training schemes: chenone(ch)/wordpiece(wp)-CTC-bMMI, and wordpiece(wp)-HMM-bMMI with different advantages in training performance, decoding efficiency and decoding time-stamp accuracy. The advantages of different training schemes are evaluated comprehensively on Librispeech, and wp-CTC-bMMI and ch-CTC-bMMI are evaluated on two real world ASR tasks to show their effectiveness. Besides, we also show bi-char(bc) HMM-MMI models can serve as better alignment models than traditional non-neural GMM-HMMs.
引用
收藏
页码:1026 / 1033
页数:8
相关论文
共 6 条
  • [1] Purely sequence-trained neural networks for ASR based on lattice-free MMI
    Povey, Daniel
    Peddinti, Vijayaditya
    Galvez, Daniel
    Ghahremani, Pegah
    Manohar, Vimal
    Na, Xingyu
    Wang, Yiming
    Khudanpur, Sanjeev
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2751 - 2755
  • [2] Sequence Discriminative Training for Offline Handwriting Recognition by an Interpolated CTC and Lattice-Free MMI Objective Function
    Hu, Wenping
    Cai, Meng
    Chen, Kai
    Ding, Haisong
    Sun, Lei
    Liang, Sen
    Mo, Xiongjian
    Huo, Qiang
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 61 - 66
  • [3] SEMI-SUPERVISED TRAINING OF ACOUSTIC MODELS USING LATTICE-FREE MMI
    Manohar, Vimal
    Hadian, Hossein
    Povey, Daniel
    Khudanpur, Sanjeev
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4844 - 4848
  • [4] Domain adaptation of lattice-free MMI based TDNN models for speech recognition
    Long Y.
    Li Y.
    Ye H.
    Mao H.
    International Journal of Speech Technology, 2017, 20 (1) : 171 - 178
  • [5] Noise Robust Automatic Scoring Based on Deep Neural Network Acoustic Models with Lattice-Free MMI and Factorized Adaptation
    Luo, Dean
    Xia, Linzhong
    Guan, Mingxiang
    MOBILE NETWORKS & APPLICATIONS, 2022, 27 (04): : 1604 - 1611
  • [6] Noise Robust Automatic Scoring Based on Deep Neural Network Acoustic Models with Lattice-Free MMI and Factorized Adaptation
    Dean Luo
    Linzhong Xia
    Mingxiang Guan
    Mobile Networks and Applications, 2022, 27 : 1604 - 1611