CTC Training of Multi-Phone Acoustic Models for Speech Recognition

被引:6
|
作者
Siohan, Olivier [1 ]
机构
[1] Google, Mountain View, CA 94043 USA
关键词
acoustic modeling; CTC; multi-phone units; pronunciation modeling;
D O I
10.21437/Interspeech.2017-505
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Phone-sized acoustic units such as triphones cannot properly capture the long-term co-articulation effects that occur in spontaneous speech. For that reason, it is interesting to construct acoustic units covering a longer time-span such as syllables or words. Unfortunately, the frequency distribution of those units is such that a few high frequency units account for most of the tokens, while many units rarely occur. As a result, those units suffer from data sparsity and can be difficult to train. In this paper we propose a scalable data-driven approach to construct a set of salient units made of sequences of phones called M-phones. We illustrate that since the decomposition of a word sequence into a sequence of M-phones is ambiguous, those units arc well suited to he used with a connectionist temporal classification (CTC) approach which does not rely on an explicit frame-level segmentation of the word sequence into a sequence of acoustic units. Experiments are presented on a Voice Search task using 12.500 hours of training data.
引用
收藏
页码:709 / 713
页数:5
相关论文
共 50 条
  • [1] Phone Synchronous Speech Recognition With CTC Lattices
    Chen, Zhehuai
    Zhuang, Yimeng
    Qian, Yanmin
    Yu, Kai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (01) : 90 - 101
  • [2] PERSONALIZATION OF CTC SPEECH RECOGNITION MODELS
    Dingliwal, Saket
    Sunkara, Monica
    Ronanki, Srikanth
    Farris, Jeff
    Kirchhoff, Katrin
    Bodapati, Sravan
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 302 - 309
  • [3] Multi-domain adversarial training of neural network acoustic models for distant speech recognition
    Mirsamadi, Seyedmahdad
    Hansen, John H. L.
    SPEECH COMMUNICATION, 2019, 106 : 21 - 30
  • [4] Exposure assessment for a wireless multi-phone charger
    20154001332639
    (1) Dept. of Radio Sciences and Enginnering, Chungnam National University, Daejeon, Korea, Republic of; (2) Electromagnetic Environment Research Center, Daejeon, Korea, Republic of; (3) DMC RandD Center, SAMSUNG Electronics, Suwon, Korea, Republic of, 1600, The Institute of Electronics, Information and Communication Engineers, Communications Society (IEICE-CS) (Institute of Electrical and Electronics Engineers Inc., United States):
  • [5] Exposure assessment for a wireless multi-phone charger
    Kang, Woo-Geun
    Alexander, Zhbanov
    Jun, Hae-Young
    Park, Yong-Ho
    Pack, Jeong-Ki
    2014 INTERNATIONAL SYMPOSIUM ON ELECTROMAGNETIC COMPATIBILITY, TOKYO (EMC'14/TOKYO), 2014, : 198 - 201
  • [6] Syllable-Based Acoustic Modeling with CTC for Multi-Scenarios Mandarin speech recognition
    Zhao, Yuanyuan
    Dong, Linhao
    Xu, Shuang
    Xu, Bo
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [7] A Speech Recognition Acoustic Model Based on LSTM-CTC
    Zhang, Yiwen
    Lu, Xuanmin
    2018 IEEE 18TH INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT), 2018, : 1052 - 1055
  • [8] Unsupervised training of acoustic models for large vocabulary continuous speech recognition
    Wessel, F
    Ney, H
    ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 307 - 310
  • [9] IMPROVING HYBRID CTC/ATTENTION END-TO-END SPEECH RECOGNITION WITH PRETRAINED ACOUSTIC AND LANGUAGE MODELS
    Deng, Keqi
    Cao, Songjun
    Zhang, Yike
    Ma, Long
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 76 - 82
  • [10] On multi-domain training and adaptation of end-to-end RNN acoustic models for distant speech recognition
    Mirsamadi, Seyedmandad
    Hansen, John H. L.
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 404 - 408