On maximum mutual information speaker-adapted training

被引：0

作者：

McDonough, J ^{[1
]}

Schaaf, T ^{[1
]}

Waibel, A ^{[1
]}

机构：

[1] Univ Karlsruhe, Inst Log Komplexitat & Deduktionsyst, Interact Syst Labs, D-76128 Karlsruhe, Germany

来源：

2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS | 2002年

关键词：

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this work, we combine maximum mutual information-based parameter estimation with speaker-adapted training (SAT). As will be shown, this can be achieved by performing unsupervised parameter estimation on the test data, a distinct advantage for many recognition tasks involving conversational speech. We also propose an approximation to the maximum likelihood and maximum mutual information SAT re-estimation formulae that greatly reduces the amount of disk space required to conduct training on corpora such as Broadcast News, which contains speech from thousands of speakers. We present the results of a set of speech recognition experiments on three test sets: the English Spontaneous Scheduling Task corpus, Broadcast News, and a new corpus of Meeting Room data collected at the Interactive Systems Laboratories of the Carnegie Mellon University.

引用

页码：601 / 604

页数：4

共 50 条

[1] On maximum mutual information speaker-adapted training
McDonough, John
Woelfel, Matthias
Stoimenov, Emilian
[J]. COMPUTER SPEECH AND LANGUAGE, 2008, 22 (02): : 130 - 147
[2] Speaker-adapted training on the Switchboard Corpus
McDonough, J
Anastasakos, T
Zavaliagkos, G
Gish, H
[J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1059 - 1062
[3] Maximum mutual information speaker adapted training with semi-tied covariance matrices
McDonough, J
Waibel, A
[J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 128 - 131
[4] Speech separation using speaker-adapted eigenvoice speech models
Weiss, Ron J.
Ellis, Daniel P. W.
[J]. COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01): : 16 - 29
[5] ASR Confidence Estimation with Speaker-Adapted Recurrent Neural Networks
Angel del-Aqua, Miguel
Piqueras, Santiago
Gimenez, Adria
Sanchis, Alberto
Civera, Jorge
Juan, Alfons
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3464 - 3468
[6] Mutual Information Enhanced Training for Speaker Embedding
Tu, Youzhi
Mak, Man-Wai
[J]. INTERSPEECH 2021, 2021, : 91 - 95
[7] Speaker-adapted confidence measures for speech recognition of video lectures
Sanchez-Cortina, Isaias
Andres-Ferrer, Jesus
Sanchis, Alberto
Juan, Alfons
[J]. COMPUTER SPEECH AND LANGUAGE, 2016, 37 : 11 - 23
[8] Speaker-adapted neural-network-based fusion for multimodal reference resolution
Kleingarn, Diana
Nabizadeh, Nima
Heckmann, Martin
Kolossa, Dorothea
[J]. 20TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2019), 2019, : 210 - 214
[9] Speaker-Adapted Confidence Measures for ASR Using Deep Bidirectional Recurrent Neural Networks
Angel Del-Agua, Miguel
Gimenez, Adria
Sanchis, Albert
Civera, Jorge
Juan, Alfons
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (07) : 1194 - 1202
[10] Robust distant speaker recognition based on position-dependent CMN by combining speaker-specific GMM with speaker-adapted HMM
Wang, Longbiao
Kitaoka, Norihide
Nakagawa, Selichi
[J]. SPEECH COMMUNICATION, 2007, 49 (06) : 501 - 513

← 1 2 3 4 5 →