AN INVESTIGATION INTO LEARNING EFFECTIVE SPEAKER SUBSPACES FOR ROBUST UNSUPERVISED DNN ADAPTATION

被引：0

作者：

Samarakoon, Lahiru ^{[1
,2
]}

Sim, Khe Chai ^{[3
]}

Mak, Brian ^{[2
]}

机构：

[1] Natl Univ Singapore, Singapore, Singapore

[2] Hong Kong Univ Sci & Technol, Hong Kong, Hong Kong, Peoples R China

[3] Google Inc, Mountain View, CA USA

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2017年

关键词：

Automatic Speech Recognition; DNN Adaptation; Subspace Methods; NEURAL-NETWORK; TRANSFORMATIONS;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Subspace methods are used for deep neural network (DNN)based acoustic model adaptation. These methods first construct a subspace and then perform the speaker adaptation as a point in the subspace. This paper aims to investigate the effectiveness of subspace methods for robust unsupervised adaptation. For the analysis, we compare two state-of-the-art subspace methods, namely, the singular value decomposition (SVD)-based bottleneck adaptation and the factorized hidden layer (FHL) adaptation. Both of these methods perform speaker adaptation as a linear combination of rank-1 bases. The main difference between the subspace construction is that FHL adaptation constructs a speaker subspace separate from the phoneme classification space while SVD-based bottleneck adaptation shares the same subspace for both the phoneme classification and the speaker adaptation. So far, no direct comparisons between these two methods are reported. In this work, we compare these two methods for their robustness to unsupervised adaptation on Aurora 4, AMI IHM and AMI SDM tasks. Our findings show that the FHL adaptation outperforms the SVD-based bottleneck adaptation especially in challenging conditions where the adaptation data is limited, or the quality of the adaptation alignments are low.

引用

页码：5035 / 5039

页数：5

共 50 条

[41] X-VECTORS: ROBUST DNN EMBEDDINGS FOR SPEAKER RECOGNITION
Snyder, David
Garcia-Romero, Daniel
Sell, Gregory
Povey, Daniel
Khudanpur, Sanjeev
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5329 - 5333
[42] DNN-Driven Mixture of PLDA for Robust Speaker Verification
Li, Na
Mak, Man-Wai
Chien, Jen-Tzung
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (06) : 1371 - 1383
[43] Unsupervised Domain Adaptation for DNN-based Automated Harvesting
Shkanaev, Aleksandr Yu
Sholomov, Dmitry L.
Nikolaev, Dmitry P.
[J]. TWELFTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2019), 2020, 11433
[44] Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS
Udagawa, Kenta
Saito, Yuki
Saruwatari, Hiroshi
[J]. INTERSPEECH 2022, 2022, : 2968 - 2972
[45] Unsupervised incremental online adaptation to unknown environment and speaker
Yook, D
[J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 617 - 620
[46] Supervised and unsupervised speaker adaptation using confidence measure
Liu, J
Li, HS
Liu, J
Liu, RS
[J]. CHINESE JOURNAL OF ELECTRONICS, 2003, 12 (01) : 139 - 143
[47] Unsupervised Speaker Adaptation of BLSTM-RNN for LVCSR Based on Speaker Code
Huang, Zhiying
Xue, Shaofei
Yan, Zhijie
Dai, Lirong
[J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
[48] Multimodal speech synthesis architecture for unsupervised speaker adaptation
Hieu-Thi Luong
Yamagishi, Junichi
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2494 - 2498
[49] Random Subspaces NMF for Unsupervised Transfer Learning
Redko, Ievgen
Bennani, Younes
[J]. PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 3901 - 3908
[50] CONSTRAINED DISCRIMINATIVE MAPPING TRANSFORMS FOR UNSUPERVISED SPEAKER ADAPTATION
Chen, Langzhou
Gales, Mark J. F.
Chin, K. K.
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5344 - 5347

← 1 2 3 4 5 →