SPEAKER ADAPTIVE TRAINING USING DEEP NEURAL NETWORKS

被引：0

作者：

Ochiai, Tsubasa ^{[1
,2
]}

Matsuda, Shigeki ^{[1
]}

Lu, Xugang ^{[1
]}

Hori, Chiori ^{[1
]}

Katagiri, Shigeru ^{[2
]}

机构：

[1] Natl Inst Informat & Commun Technol, Spoken Language Commun Lab, Kyoto, Japan

[2] Doshisha Univ, Grad Sch Engn, Kyoto, Japan

来源：

2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年

关键词：

Speaker Adaptative Training; Deep Neural Network; ADAPTATION;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Among many speaker adaptation embodiments, Speaker Adaptive Training (SAT) has been successfully applied to a standard Hidden-Markov-Model (HMM) speech recognizer, whose state is associated with Gaussian Mixture Models (GMMs). On the other hand, recent studies on Speaker-Independent (SI) recognizer development have reported that a new type of HMM speech recognizer, which replaces GMMs with Deep Neural Networks (DNNs), outperforms GMM-HMM recognizers. Along these two lines, it is natural to conceive of further improvement to a preset DNN-HMM recognizer by employing SAT. In this paper, we propose a novel training scheme that applies SAT to a SI DNN-HMM recognizer. We then implement the SAT scheme by allocating a Speaker-Dependent (SD) module to one of the intermediate layers of a seven-layer DNN, and elaborate its utility over TED Talks corpus data. Experiment results show that our speaker-adapted SAT-based DNN-HMM recognizer reduces the word error rate by 8.4% more than that of a baseline SI DNN-HMM recognizer, and (regardless of the SD module allocation) outperforms the conventional speaker adaptation scheme. The results also show that the inner layers of DNN are more suitable for the SD module than the outer layers.

引用

页数：5

共 50 条

[21] Speech Enhancement for Speaker Recognition Using Deep Recurrent Neural Networks
Tkachenko, Maxim
Yamshinin, Alexander
Lyubimov, Nikolay
Kotov, Mikhail
Nastasenko, Marina
SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 690 - 699
[22] Speaker-dependent multipitch tracking using deep neural networks
Liu, Yuzhou
Wang, DeLiang
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 141 (02): : 710 - 721
[23] Prompt Tuning of Deep Neural Networks for Speaker-Adaptive Visual Speech Recognition
Kim, Minsu
Kim, Hyung-Il
Ro, Yong Man
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (02) : 1042 - 1055
[24] Insights into Deep Neural Networks for Speaker Recognition
Garcia-Romero, Daniel
McCree, Alan
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1141 - 1145
[25] DEEP NEURAL NETWORKS FOR COCHANNEL SPEAKER IDENTIFICATION
Zhao, Xiaojia
Wang, Yuxuan
Wang, DeLiang
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4824 - 4828
[26] Training Robust Deep Neural Networks on Noisy Labels Using Adaptive Sample Selection With Disagreement
Takeda, Hiroshi
Yoshida, Soh
Muneyasu, Mitsuji
IEEE ACCESS, 2021, 9 : 141131 - 141143
[27] Training Deep Spiking Neural Networks Using Backpropagation
Lee, Jun Haeng
Delbruck, Tobi
Pfeiffer, Michael
FRONTIERS IN NEUROSCIENCE, 2016, 10
[28] ON TRAINING DEEP NEURAL NETWORKS USING A STREAMING APPROACH
Duda, Piotr
Jaworski, Maciej
Cader, Andrzej
Wang, Lipo
JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH, 2020, 10 (01) : 15 - 26
[29] PWPROP: A Progressive Weighted Adaptive Method for Training Deep Neural Networks
Wang, Dong
Xu, Tao
Zhang, Huatian
Shang, Fanhua
Liu, Hongying
Liu, Yuanyuan
Shen, Shengmei
2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 508 - 515
[30] An Adaptive Layer Expansion Algorithm for Efficient Training of Deep Neural Networks
Chen, Yi-Long
Liu, Pangfeng
Wu, Jan-Jan
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 420 - 425

← 1 2 3 4 5 →