Unsupervised Lattice-based Acoustic Model Adaptation for Speaker-Dependent Conversational Telephone Speech Transcription

被引:0
|
作者
Thambiratnam, K. [1 ]
Seide, E. [1 ]
机构
[1] Microsoft Res Asia, 5F Sigma Ctr, Beijing 100080, Peoples R China
关键词
Unsupervised Acoustic Model Adaptation; Conversational Speech Recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper examines the application of lattice adaptation techniques to speaker-dependent models for the purpose of conversational telephone speech transcription. Given sufficient training data per speaker, it is feasible to build adapted speaker-dependent models using lattice MLLR and lattice MAP. Experiments on iterative and cascaded adaptation arc presented. Additionally various strategies for thresholding frame posteriors are investigated, and it is shown that accumulating statistics from the local best-confidence path is sufficient to achieve optimal adaptation. Overall, an iterative cascaded lattice system was able to reduce WER by 7.0% abs., which was a 0.8% abs. gain over transcript-based adaptation. Lattice adaptation reduced the unsupervised/supervised adaptation gap from 2.5% to 1.7%.
引用
收藏
页码:1567 / 1570
页数:4
相关论文
共 50 条
  • [21] Combination of multiple acoustic models with unsupervised adaptation for lecture speech transcription
    Shen, Peng
    Lu, Xugang
    Hu, Xinhui
    Kanda, Naoyuki
    Saiko, Masahiro
    Hori, Chiori
    Kawai, Hisashi
    [J]. SPEECH COMMUNICATION, 2016, 82 : 1 - 13
  • [22] SPEAKER-DEPENDENT WAVENET-BASED DELAY-FREE ADPCM SPEECH CODING
    Yoshimura, Takenori
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7145 - 7149
  • [23] Speaker-dependent Isolated-Word Speech Recognition System Based on Vector Quantization
    Zhao, Yinyin
    Zhu, Lei
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTER NETWORK, ELECTRONIC AND AUTOMATION (ICCNEA), 2017, : 133 - 137
  • [24] Acoustic training from heterogeneous data sources: Experiments in mandarin conversational telephone speech transcription
    Tsakalidis, S
    Byrne, W
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 461 - 464
  • [25] N-Best-based unsupervised speaker adaptation for speech recognition
    Matsui, T
    Furui, S
    [J]. COMPUTER SPEECH AND LANGUAGE, 1998, 12 (01): : 41 - 50
  • [26] A UNIFIED SPEAKER-DEPENDENT SPEECH SEPARATION AND ENHANCEMENT SYSTEM BASED ON DEEP NEURAL NETWORKS
    Gao, Tian
    Du, Jun
    Xu, Li
    Liu, Cong
    Dai, Li-Rong
    Lee, Chin-Hui
    [J]. 2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, 2015, : 687 - 691
  • [27] Speech Data Clustering Based on Phoneme Error Trend for Unsupervised Acoustic Model Adaptation
    Asami, Taichi
    Kobashikawa, Satoshi
    Masataki, Hirokazu
    Yoshioka, Osamu
    Takahashi, Satoshi
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1758 - 1761
  • [28] A Speaker-Dependent Deep Learning Approach to Joint Speech Separation and Acoustic Modeling for Multi-Talker Automatic Speech Recognition
    Tu, Yan-Hui
    Du, Jun
    Dai, Li-Rung
    Lee, Chin-Hui
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [29] KL-divergence Regularized Deep Neural Network Adaptation for Low-resource Speaker-dependent Speech Enhancement
    Chai, Li
    Du, Jun
    Lee, Chin-Hui
    [J]. INTERSPEECH 2019, 2019, : 1806 - 1810
  • [30] SPEAKER AGE ESTIMATION ON CONVERSATIONAL TELEPHONE SPEECH USING SENONE POSTERIOR BASED I-VECTORS
    Sadjadi, Seyed Omid
    Ganapathy, Sriram
    Pelecanos, Jason W.
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5040 - 5044