SPEAKER IDENTITY PRESERVATION IN DYSARTHRIC SPEECH RECONSTRUCTION BY ADVERSARIAL SPEAKER ADAPTATION

被引：1

作者：

Wang, Disong ^{[1
]}

Liu, Songxiang ^{[1
]}

Wu, Xixin ^{[1
]}

Lu, Hui ^{[1
]}

Sun, Lifa ^{[2
]}

Liu, Xunying ^{[1
]}

Meng, Helen ^{[1
,3
]}

机构：

[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China

[2] SpeechX Ltd, Shenzhen, Peoples R China

[3] Ctr Perceptual & Interact Intelligence, Hong Kong, Peoples R China

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

Dysarthric speech reconstruction; voice conversion; adversarial speaker adaptation; speaker identity; NETWORKS;

D O I：

10.1109/ICASSP43922.2022.9746680

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Dysarthric speech reconstruction (DSR), which aims to improve the quality of dysarthric speech, remains a challenge, not only because we need to restore the speech to be normal, but also must preserve the speaker's identity. The speaker representation extracted by the speaker encoder (SE) optimized for speaker verification has been explored to control the speaker identity. However, the SE may not be able to fully capture the characteristics of dysarthric speakers that are previously unseen. To address this research problem, we propose a novel multi-task learning strategy, i.e., adversarial speaker adaptation (ASA). The primary task of ASA fine-tunes the SE with the speech of the target dysarthric speaker to effectively capture identity-related information, and the secondary task applies adversarial training to avoid the incorporation of abnormal speaking patterns into the reconstructed speech, by regularizing the distribution of reconstructed speech to be close to that of reference speech with high quality. Experiments show that the proposed approach can achieve enhanced speaker similarity and comparable speech naturalness with a strong baseline approach. Compared with dysarthric speech, the reconstructed speech achieves 22.3% and 31.5% absolute word error rate reduction for speakers with moderate and moderate-severe dysarthria respectively. Our demo page is released here(1).

引用

页码：6677 / 6681

页数：5

共 50 条

[1] Speech compression with preservation of speaker identity
Leis, J
Phythian, M
Sridharan, S
[J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1711 - 1714
[2] Adversarial-Free Speaker Identity-Invariant Representation Learning for Automatic Dysarthric Speech Classification
Janbakhshi, Parvaneh
Kodrasi, Ina
[J]. INTERSPEECH 2022, 2022, : 2138 - 2142
[3] Robust speech coding for the preservation of speaker identity
Phythian, M
Leis, J
Sridharan, S
[J]. ISSPA 96 - FOURTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, PROCEEDINGS, VOLS 1 AND 2, 1996, : 395 - 398
[4] Regularized Speaker Adaptation of KL-HMM for Dysarthric Speech Recognition
Kim, Myungjong
Kim, Younggwan
Yoo, Joohong
Wang, Jun
Kim, Hoirin
[J]. IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2017, 25 (09) : 1581 - 1591
[5] ADVERSARIAL SPEAKER ADAPTATION
Meng, Zhong
Li, Jinyu
Gong, Yifan
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5721 - 5725
[6] SPEAKER INTONATION ADAPTATION FOR TRANSFORMING TEXT-TO-SPEECH SYNTHESIS SPEAKER IDENTITY
Langarani, Mahsa Sadat Elyasi
van Santen, Jan
[J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 116 - 123
[7] SYNTHESIZING DYSARTHRIC SPEECH USING MULTI-SPEAKER TTS FOR DYSARTHRIC SPEECH RECOGNITION
Soleymanpour, Mohammad
Johnson, Michael T.
Soleymanpour, Rahim
Berry, Jeffrey
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7382 - 7386
[8] The influence of speaker and listener variables on intelligibility of dysarthric speech
Patel, Rupal
Usher, Nicole
Kember, Heather
Russell, Scott
Laures-Gore, Jacqueline
[J]. JOURNAL OF COMMUNICATION DISORDERS, 2014, 51 : 13 - 18
[9] Comparing Speaker-Dependent and Speaker-Adaptive Acoustic Models for Recognizing Dysarthric Speech
Rudzicz, Frank
[J]. ASSETS'07: PROCEEDINGS OF THE NINTH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY, 2007, : 255 - 256
[10] Speech-to-Speech Conversion: An Approach to Enhance the Speech Intelligibility of Dysarthric Speaker
Janai, Siddhanna
Shreekanth, T.
Chandan, M.
Abraham, Ajish K.
[J]. INTERNATIONAL JOURNAL OF AMBIENT COMPUTING AND INTELLIGENCE, 2021, 12 (01) : 184 - 206

← 1 2 3 4 5 →