SPEAKER ADAPTIVE TRAINING USING DEEP NEURAL NETWORKS

被引：0

作者：

Ochiai, Tsubasa ^{[1
,2
]}

Matsuda, Shigeki ^{[1
]}

Lu, Xugang ^{[1
]}

Hori, Chiori ^{[1
]}

Katagiri, Shigeru ^{[2
]}

机构：

[1] Natl Inst Informat & Commun Technol, Spoken Language Commun Lab, Kyoto, Japan

[2] Doshisha Univ, Grad Sch Engn, Kyoto, Japan

来源：

2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年

关键词：

Speaker Adaptative Training; Deep Neural Network; ADAPTATION;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Among many speaker adaptation embodiments, Speaker Adaptive Training (SAT) has been successfully applied to a standard Hidden-Markov-Model (HMM) speech recognizer, whose state is associated with Gaussian Mixture Models (GMMs). On the other hand, recent studies on Speaker-Independent (SI) recognizer development have reported that a new type of HMM speech recognizer, which replaces GMMs with Deep Neural Networks (DNNs), outperforms GMM-HMM recognizers. Along these two lines, it is natural to conceive of further improvement to a preset DNN-HMM recognizer by employing SAT. In this paper, we propose a novel training scheme that applies SAT to a SI DNN-HMM recognizer. We then implement the SAT scheme by allocating a Speaker-Dependent (SD) module to one of the intermediate layers of a seven-layer DNN, and elaborate its utility over TED Talks corpus data. Experiment results show that our speaker-adapted SAT-based DNN-HMM recognizer reduces the word error rate by 8.4% more than that of a baseline SI DNN-HMM recognizer, and (regardless of the SD module allocation) outperforms the conventional speaker adaptation scheme. The results also show that the inner layers of DNN are more suitable for the SD module than the outer layers.

引用

页数：5

共 50 条

[31] Robust Training and Initialization of Deep Neural Networks: An Adaptive Basis Viewpoint
Cyr, Eric C.
Gulian, Mamikon A.
Patel, Ravi G.
Perego, Mauro
Trask, Nathaniel A.
MATHEMATICAL AND SCIENTIFIC MACHINE LEARNING, VOL 107, 2020, 107 : 512 - 536
[32] Adaptive Normalized Risk-Averting Training for Deep Neural Networks
Wang, Zhiguang
Oates, Tim
Lo, James
THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2201 - 2207
[33] Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks
Koster, Urs
Webb, Tristan J.
Wang, Xin
Nassar, Marcel
Bansal, Arjun K.
Constable, William H.
Elibol, Oguz H.
Gray, Scott
Hall, Stewart
Hornof, Luke
Khosrowshahi, Amir
Kloss, Carey
Pai, Ruby J.
Rao, Naveen
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[34] Speaker adaptation using codebook integrated deep neural networks for speech enhancement
Chidambar, B.
Naidu, D. Hanumanth Rao
JASA EXPRESS LETTERS, 2024, 4 (11):
[35] Improved Speaker Recognition System for Stressed Speech using Deep Neural Networks
Dumpala, Sri Harsha
Kopparapu, Sunil Kumar
2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 1257 - 1264
[36] Speaker identification using neural networks
Pawar, RV
Kajave, PP
Mali, SN
ENFORMATIKA, VOL 7: IEC 2005 PROCEEDINGS, 2005, : 429 - 433
[37] Improving Deep Neural Networks Based Speaker Verification Using Unlabeled Data
Tian, Yao
Cai, Meng
He, Liang
Zhang, Wei-Qiang
Liu, Jia
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1863 - 1867
[38] Speaker Identification using Neural Networks
Pawar, R. V.
Kajave, P. P.
Mali, S. N.
PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 7, 2005, 7 : 429 - 433
[39] SPEAKER ADAPTATION OF CONTEXT DEPENDENT DEEP NEURAL NETWORKS
Liao, Hank
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7947 - 7951
[40] ASVtorch toolkit: Speaker verification with deep neural networks
Lee, Kong Aik
Vestman, Ville
Kinnunen, Tomi
SOFTWAREX, 2021, 14

← 1 2 3 4 5 →