Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech

被引：134

作者：

Nakamura, Keigo ^{[1
]}

Toda, Tomoki ^{[1
]}

Saruwatari, Hiroshi ^{[1
]}

Shikano, Kiyohiro ^{[1
]}

机构：

[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Ikoma, Nara 6300192, Japan

来源：

SPEECH COMMUNICATION | 2012年 / 54卷 / 01期

关键词：

Electrolaryngeal speech; Voice conversion; Speaking-aid system; Speech enhancement; Airpressure sensor; Silence excitation; Non-audible murmur; Laryngectomee; MAXIMUM-LIKELIHOOD; LARYNGECTOMY;

D O I：

10.1016/j.specom.2011.07.007

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

An electrolarynx (EL) is a medical device that generates sound source signals to provide laryngectomees with a voice. In this article we focus on two problems of speech produced with an EL (EL speech). One problem is that EL speech is extremely unnatural and the other is that sound source signals with high energy are generated by an EL, and therefore, the signals often annoy surrounding people. To address these two problems, in this article we propose three speaking-aid systems that enhance three different types of EL speech signals: EL speech, EL speech using an air-pressure sensor (EL-air speech), and silent EL speech. The air-pressure sensor enables a laryngectomee to manipulate the F-0 contours of EL speech using exhaled air that flows from the tracheostoma. Silent EL speech is produced with a new sound source unit that generates signals with extremely low energy. Our speaking-aid systems address the poor quality of EL speech using voice conversion (VC), which transforms acoustic features so that it appears as if the speech is uttered by another person. Our systems estimate spectral parameters, F-0 and aperiodic components independently. The result of experimental evaluations demonstrates that the use of an air-pressure sensor dramatically improves F-0 estimation accuracy. Moreover, it is revealed that the converted speech signals are preferred to source EL speech. (C) 2011 Elsevier B.V. All rights reserved.

引用

页码：134 / 146

页数：13

共 50 条

[1] Enhancing a Glossectomy Patient's Speech via GMM-based Voice Conversion
Tanaka, Kei
Hara, Sunao
Abe, Masanobu
Minagi, Shogo
[J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
[2] Electrolaryngeal Speech Enhancement Based on Statistical Voice Conversion
Nakamura, Keigo
Toda, Tomoki
Saruwatari, Hiroshi
Shikano, Kiyohiro
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1443 - 1446
[3] Voice Conversion Using Bilinear Model Integrated with Joint GMM-based Classification
Sun, Xinjian
Zhang, Xiongwei
Yang, Jibin
Cao, Tieyong
[J]. 2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2013, : 1225 - 1228
[4] Comparing GMM-based speech transformation systems
Mesbahi, Larbi
Barreaud, Vincent
Boeffard, Olivier
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2852 - 2855
[5] Electrolaryngeal Speech Enhancement with Statistical Voice Conversion based on CLDNN
Kobayashi, Kazuhiro
Toda, Tomoki
[J]. 2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2115 - 2119
[6] Speaker Dependent Approach for Enhancing a Glossectomy Patient's Speech via GMM-based Voice Conversion
Tanaka, Kei
Hara, Sunao
Abe, Masanobu
Sato, Masaaki
Minagi, Shogo
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3384 - 3388
[7] Incorporating Global Variance in the Training Phase of GMM-based Voice Conversion
Hwang, Hsin-Te
Tsao, Yu
Wang, Hsin-Min
Wang, Yih-Ru
Chen, Sin-Horng
[J]. 2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
[8] GMM-Based Speaker Gender and Age Classification After Voice Conversion
Pribil, Jiri
Pribilova, Anna
Matousek, Jindrich
[J]. 2016 FIRST INTERNATIONAL WORKSHOP ON SENSING, PROCESSING AND LEARNING FOR INTELLIGENT MACHINES (SPLINE), 2016,
[9] Improving the Quality of Standard GMM-Based Voice Conversion Systems by Considering Physically Motivated Linear Transformations
Zorila, Tudor-Catalin
Erro, Daniel
Hernaez, Inma
[J]. ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, 2012, 328 : 30 - 39
[10] Modulation Spectrum-Based Post-Filter for GMM-Based Voice Conversion
Takamichi, Shinnosuke
Toda, Tomoki
Black, Alan W.
Nakamura, Satoshi
[J]. 2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,

← 1 2 3 4 5 →