On maximum mutual information speaker-adapted training

被引:0
|
作者
McDonough, J [1 ]
Schaaf, T [1 ]
Waibel, A [1 ]
机构
[1] Univ Karlsruhe, Inst Log Komplexitat & Deduktionsyst, Interact Syst Labs, D-76128 Karlsruhe, Germany
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this work, we combine maximum mutual information-based parameter estimation with speaker-adapted training (SAT). As will be shown, this can be achieved by performing unsupervised parameter estimation on the test data, a distinct advantage for many recognition tasks involving conversational speech. We also propose an approximation to the maximum likelihood and maximum mutual information SAT re-estimation formulae that greatly reduces the amount of disk space required to conduct training on corpora such as Broadcast News, which contains speech from thousands of speakers. We present the results of a set of speech recognition experiments on three test sets: the English Spontaneous Scheduling Task corpus, Broadcast News, and a new corpus of Meeting Room data collected at the Interactive Systems Laboratories of the Carnegie Mellon University.
引用
收藏
页码:601 / 604
页数:4
相关论文
共 50 条
  • [1] On maximum mutual information speaker-adapted training
    McDonough, John
    Woelfel, Matthias
    Stoimenov, Emilian
    [J]. COMPUTER SPEECH AND LANGUAGE, 2008, 22 (02): : 130 - 147
  • [2] Speaker-adapted training on the Switchboard Corpus
    McDonough, J
    Anastasakos, T
    Zavaliagkos, G
    Gish, H
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1059 - 1062
  • [3] Maximum mutual information speaker adapted training with semi-tied covariance matrices
    McDonough, J
    Waibel, A
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 128 - 131
  • [4] Speech separation using speaker-adapted eigenvoice speech models
    Weiss, Ron J.
    Ellis, Daniel P. W.
    [J]. COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01): : 16 - 29
  • [5] ASR Confidence Estimation with Speaker-Adapted Recurrent Neural Networks
    Angel del-Aqua, Miguel
    Piqueras, Santiago
    Gimenez, Adria
    Sanchis, Alberto
    Civera, Jorge
    Juan, Alfons
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3464 - 3468
  • [6] Mutual Information Enhanced Training for Speaker Embedding
    Tu, Youzhi
    Mak, Man-Wai
    [J]. INTERSPEECH 2021, 2021, : 91 - 95
  • [7] Speaker-adapted confidence measures for speech recognition of video lectures
    Sanchez-Cortina, Isaias
    Andres-Ferrer, Jesus
    Sanchis, Alberto
    Juan, Alfons
    [J]. COMPUTER SPEECH AND LANGUAGE, 2016, 37 : 11 - 23
  • [8] Speaker-adapted neural-network-based fusion for multimodal reference resolution
    Kleingarn, Diana
    Nabizadeh, Nima
    Heckmann, Martin
    Kolossa, Dorothea
    [J]. 20TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2019), 2019, : 210 - 214
  • [9] Speaker-Adapted Confidence Measures for ASR Using Deep Bidirectional Recurrent Neural Networks
    Angel Del-Agua, Miguel
    Gimenez, Adria
    Sanchis, Albert
    Civera, Jorge
    Juan, Alfons
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (07) : 1194 - 1202
  • [10] Robust distant speaker recognition based on position-dependent CMN by combining speaker-specific GMM with speaker-adapted HMM
    Wang, Longbiao
    Kitaoka, Norihide
    Nakagawa, Selichi
    [J]. SPEECH COMMUNICATION, 2007, 49 (06) : 501 - 513