WHISPERED AND LOMBARD NEURAL SPEECH SYNTHESIS

被引：8

作者：

Hu, Qiong ^{[1
]}

Bleisch, Tobias ^{[1
]}

Petkov, Petko ^{[1
]}

Raitio, Tuomo ^{[1
]}

Marchi, Erik ^{[1
]}

Lakshminarasimhan, Varun ^{[1
]}

机构：

[1] Apple Inc, Cupertino, CA 95014 USA

来源：

2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT) | 2021年

关键词：

speech synthesis; speaker adaptation; multi-speaker training; Lombard speech; whisper speech; TEXT-TO-SPEECH;

D O I：

10.1109/SLT48900.2021.9383454

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

It is desirable for a text-to-speech system to take into account the environment where synthetic speech is presented, and provide appropriate context-dependent output to the user. In this paper, we present and compare various approaches for generating different speaking styles, namely, normal, Lombard, and whisper speech, using only limited data. The following systems are proposed and assessed: 1) Pre-training and fine-tuning a model for each style. 2) Lombard and whisper speech conversion through a signal processing based approach. 3) Multi-style generation using a single model based on a speaker verification model. Our mean opinion score and AB preference listening tests show that 1) we can generate high quality speech through the pre-training/fine-tuning approach for all speaking styles. 2) Although our speaker verification (SV) model is not explicitly trained to discriminate different speaking styles, and no Lombard and whisper voice is used for pretrain this system, SV model can be used as style encoder for generating different style embeddings as input for Tacotron system. We also show that the resulting synthetic Lombard speech has a significant positive impact on intelligibility gain.

引用

页码：454 / 461

页数：8

共 50 条

[1] APPLICATION OF NEURAL NETWORKS IN WHISPERED SPEECH RECOGNITION
Grozdic, Dorde T.
Markovic, Branko
Galic, Jovan
Jovicic, Slobodan T.
2012 20TH TELECOMMUNICATIONS FORUM (TELFOR), 2012, : 728 - 731
[2] Voice Conversion for Whispered Speech Synthesis
Cotescu, Marius
Drugman, Thomas
Huybrechts, Goeric
Lorenzo-Trueba, Jaime
Moinet, Alexis
IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 186 - 190
[3] Neural Whispered Speech Detection with Imbalanced Learning
Ashihara, Takanori
Shinohara, Yusuke
Sato, Hiroshi
Moriya, Takafumi
Matsui, Kiyoaki
Fukutomi, Takaaki
Yamaguchi, Yoshikazu
Aono, Yushi
INTERSPEECH 2019, 2019, : 3352 - 3356
[4] Reconstruction of Normal Speech from Whispered Speech based on RBF Neural Network
Tao, Zhi
Tan, Xue-Dan
Han, Tao
Gu, Ji-Hua
Xu, Yi-Shen
Zhao, He-Ming
2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 374 - 377
[5] LOMBARD SPEECH SYNTHESIS USING LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS
Bollepalli, Bajibabu
Airaksinen, Manu
Alku, Paavo
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5505 - 5509
[6] Analysis of HMM-Based Lombard Speech Synthesis
Raitio, Tuomo
Suni, Antti
Vainio, Martti
Alku, Paavo
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2792 - +
[7] Online Lombard-adaptation in incremental speech synthesis
Rottschaefer, Sebastian
Buschmeier, Hendrik
van Welbergen, Herwin
Kopp, Stefan
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 80 - 84
[8] A robust Voiced/Unvoiced phoneme classification from whispered speech using the 'color' of whispered phonemes and Deep Neural Network
Meenakshi, G. Nisha
Ghosh, Prasanta Kumar
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 503 - 507
[9] Analysis and recognition of whispered speech
Ito, T
Takeda, K
Itakura, F
SPEECH COMMUNICATION, 2005, 45 (02) : 139 - 152
[10] Speaker Identification with Whispered Speech mode Using MFCC: Challenges to Whispered Speech Identification
Sardar, V. M.
Shrbahadurkar, S. D.
2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION PROCESSING (ICIP), 2015, : 70 - 74

← 1 2 3 4 5 →