WHISPERED AND LOMBARD NEURAL SPEECH SYNTHESIS

被引:8
|
作者
Hu, Qiong [1 ]
Bleisch, Tobias [1 ]
Petkov, Petko [1 ]
Raitio, Tuomo [1 ]
Marchi, Erik [1 ]
Lakshminarasimhan, Varun [1 ]
机构
[1] Apple Inc, Cupertino, CA 95014 USA
关键词
speech synthesis; speaker adaptation; multi-speaker training; Lombard speech; whisper speech; TEXT-TO-SPEECH;
D O I
10.1109/SLT48900.2021.9383454
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is desirable for a text-to-speech system to take into account the environment where synthetic speech is presented, and provide appropriate context-dependent output to the user. In this paper, we present and compare various approaches for generating different speaking styles, namely, normal, Lombard, and whisper speech, using only limited data. The following systems are proposed and assessed: 1) Pre-training and fine-tuning a model for each style. 2) Lombard and whisper speech conversion through a signal processing based approach. 3) Multi-style generation using a single model based on a speaker verification model. Our mean opinion score and AB preference listening tests show that 1) we can generate high quality speech through the pre-training/fine-tuning approach for all speaking styles. 2) Although our speaker verification (SV) model is not explicitly trained to discriminate different speaking styles, and no Lombard and whisper voice is used for pretrain this system, SV model can be used as style encoder for generating different style embeddings as input for Tacotron system. We also show that the resulting synthetic Lombard speech has a significant positive impact on intelligibility gain.
引用
收藏
页码:454 / 461
页数:8
相关论文
共 50 条
  • [41] The Recognition of Whispered Speech in Real-Time
    Hendrickson, Kristi
    Ernest, Danielle
    EAR AND HEARING, 2022, 43 (02): : 554 - 562
  • [42] Whispered speech to neutral speech conversion using bidirectional LSTMs
    Meenakshi, G. Nisha
    Ghosh, Prasanta Kumar
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 491 - 495
  • [43] Lombard effect compensation and noise suppression for noisy Lombard speech recognition
    Chi, SM
    Oh, YH
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2013 - 2016
  • [44] Formant Frequency Estimations of Whispered Speech in Chinese
    Lv, Gang
    Zhao, Heming
    ARCHIVES OF ACOUSTICS, 2009, 34 (02) : 127 - 135
  • [45] MESOPHARYNGEAL AIR-PRESSURE IN WHISPERED SPEECH
    HIGASHIKAWA, M
    SAKAKURA, A
    TAKAHASHI, H
    FOLIA PHONIATRICA ET LOGOPAEDICA, 1995, 47 (02) : 77 - 78
  • [46] Mandarin Connected Digits Recognition for Whispered Speech
    Ru Tingting
    Xie Xiang
    Yin Hui
    Kuang Jingming
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1141 - 1144
  • [47] Whispered Speech Database: Design, Processing and Application
    Markovic, Branko
    Jovicic, Slobodan T.
    Galic, Jovan
    Grozdic, Dorde
    TEXT, SPEECH, AND DIALOGUE, TSD 2013, 2013, 8082 : 591 - 598
  • [48] ACOUSTIC ANALYSIS FOR SPEAKER IDENTIFICATION OF WHISPERED SPEECH
    Fan, Xing
    Hansen, John H. L.
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5046 - 5049
  • [49] HTK-Based Recognition of Whispered Speech
    Galic, Jovan
    Jovicic, Slobodan T.
    Grozdic, Dorde
    Markovic, Branko
    SPEECH AND COMPUTER, 2014, 8773 : 251 - 258
  • [50] A Preliminary Study on Emotions of Chinese Whispered Speech
    Gong Chenghui
    Zhao Heming
    Zou Wei
    Wang Yanlei
    Wang Min
    2009 INTERNATIONAL FORUM ON COMPUTER SCIENCE-TECHNOLOGY AND APPLICATIONS, VOL 2, PROCEEDINGS, 2009, : 429 - +