SPEAKER IDENTITY PRESERVATION IN DYSARTHRIC SPEECH RECONSTRUCTION BY ADVERSARIAL SPEAKER ADAPTATION

被引:1
|
作者
Wang, Disong [1 ]
Liu, Songxiang [1 ]
Wu, Xixin [1 ]
Lu, Hui [1 ]
Sun, Lifa [2 ]
Liu, Xunying [1 ]
Meng, Helen [1 ,3 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] SpeechX Ltd, Shenzhen, Peoples R China
[3] Ctr Perceptual & Interact Intelligence, Hong Kong, Peoples R China
关键词
Dysarthric speech reconstruction; voice conversion; adversarial speaker adaptation; speaker identity; NETWORKS;
D O I
10.1109/ICASSP43922.2022.9746680
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Dysarthric speech reconstruction (DSR), which aims to improve the quality of dysarthric speech, remains a challenge, not only because we need to restore the speech to be normal, but also must preserve the speaker's identity. The speaker representation extracted by the speaker encoder (SE) optimized for speaker verification has been explored to control the speaker identity. However, the SE may not be able to fully capture the characteristics of dysarthric speakers that are previously unseen. To address this research problem, we propose a novel multi-task learning strategy, i.e., adversarial speaker adaptation (ASA). The primary task of ASA fine-tunes the SE with the speech of the target dysarthric speaker to effectively capture identity-related information, and the secondary task applies adversarial training to avoid the incorporation of abnormal speaking patterns into the reconstructed speech, by regularizing the distribution of reconstructed speech to be close to that of reference speech with high quality. Experiments show that the proposed approach can achieve enhanced speaker similarity and comparable speech naturalness with a strong baseline approach. Compared with dysarthric speech, the reconstructed speech achieves 22.3% and 31.5% absolute word error rate reduction for speakers with moderate and moderate-severe dysarthria respectively. Our demo page is released here(1).
引用
收藏
页码:6677 / 6681
页数:5
相关论文
共 50 条
  • [1] Speech compression with preservation of speaker identity
    Leis, J
    Phythian, M
    Sridharan, S
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1711 - 1714
  • [2] Adversarial-Free Speaker Identity-Invariant Representation Learning for Automatic Dysarthric Speech Classification
    Janbakhshi, Parvaneh
    Kodrasi, Ina
    [J]. INTERSPEECH 2022, 2022, : 2138 - 2142
  • [3] Robust speech coding for the preservation of speaker identity
    Phythian, M
    Leis, J
    Sridharan, S
    [J]. ISSPA 96 - FOURTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, PROCEEDINGS, VOLS 1 AND 2, 1996, : 395 - 398
  • [4] Regularized Speaker Adaptation of KL-HMM for Dysarthric Speech Recognition
    Kim, Myungjong
    Kim, Younggwan
    Yoo, Joohong
    Wang, Jun
    Kim, Hoirin
    [J]. IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2017, 25 (09) : 1581 - 1591
  • [5] ADVERSARIAL SPEAKER ADAPTATION
    Meng, Zhong
    Li, Jinyu
    Gong, Yifan
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5721 - 5725
  • [6] SPEAKER INTONATION ADAPTATION FOR TRANSFORMING TEXT-TO-SPEECH SYNTHESIS SPEAKER IDENTITY
    Langarani, Mahsa Sadat Elyasi
    van Santen, Jan
    [J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 116 - 123
  • [7] SYNTHESIZING DYSARTHRIC SPEECH USING MULTI-SPEAKER TTS FOR DYSARTHRIC SPEECH RECOGNITION
    Soleymanpour, Mohammad
    Johnson, Michael T.
    Soleymanpour, Rahim
    Berry, Jeffrey
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7382 - 7386
  • [8] The influence of speaker and listener variables on intelligibility of dysarthric speech
    Patel, Rupal
    Usher, Nicole
    Kember, Heather
    Russell, Scott
    Laures-Gore, Jacqueline
    [J]. JOURNAL OF COMMUNICATION DISORDERS, 2014, 51 : 13 - 18
  • [9] Comparing Speaker-Dependent and Speaker-Adaptive Acoustic Models for Recognizing Dysarthric Speech
    Rudzicz, Frank
    [J]. ASSETS'07: PROCEEDINGS OF THE NINTH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY, 2007, : 255 - 256
  • [10] Speech-to-Speech Conversion: An Approach to Enhance the Speech Intelligibility of Dysarthric Speaker
    Janai, Siddhanna
    Shreekanth, T.
    Chandan, M.
    Abraham, Ajish K.
    [J]. INTERNATIONAL JOURNAL OF AMBIENT COMPUTING AND INTELLIGENCE, 2021, 12 (01) : 184 - 206