Whisper to normal speech conversion using pitch estimated from spectrum

被引：10

作者：

Konno, Hideaki ^{[1
]}

Kudo, Mineichi ^{[2
]}

Imai, Hideyuki ^{[2
]}

Sugimoto, Masanori ^{[2
]}

机构：

[1] Hokkaido Univ, Hakodate Campus,1-2 Hachiman Cho, Hakodate, Hokkaido 0408567, Japan

[2] Hokkaido Univ, Grad Sch Informat Sci & Technol, Sapporo, Hokkaido 0600814, Japan

来源：

SPEECH COMMUNICATION | 2016年 / 83卷

关键词：

Whispered speech; Pitch; Mel-scaled filter bank; Principal component analysis; Multiple regression analysis; RECONSTRUCTION; RECOGNITION; PROSODY; VOWELS;

D O I：

10.1016/j.specom.2016.07.001

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We can perceive pitch in whispered speech, although fundamental frequency (F-0) does not exist physically or phonetically due to the lack of vocal-fold vibration. This study was carried out to determine how people generate such an unvoiced pitch. We conducted experiments in which speakers uttered five whispered Japanese vowels in accordance with the pitch of a guide pure tone. From the results, we derived a multiple regression function to convert the outputs of a mel-scaled filter bank of whispered speech into the perceived pitch value. Next, using this estimated pitch value as F-0, we constructed a system for conversion of whispered speech to normal speech. Since the pitch varies with time according to the spectral shape, it was expected that the pitch accent would be kept by this conversion. Indeed, auditory experiments demonstrated that the correctly perceived rate of Japanese word accent was increased from 55.5% to 72.0% compared with that when a constant F-0 was used. (C) 2016 Elsevier B.V. All rights reserved.

引用

页码：10 / 20

页数：11

共 50 条

[31] Pitch Estimation in Noisy Speech Using Accumulated Peak Spectrum and Sparse Estimation Technique
Huang, Feng
Lee, Tan
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (01): : 97 - 107
[32] Estimation of Noise-Corrupted Speech DFT-Spectrum Using the Pitch Period
Erell, Adoram
Weintraub, Mitchel
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (01): : 1 - 8
[33] Vocalic correlates of pitch in whispered versus normal speech
Heeren, Willemijn F. L.
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2015, 138 (06): : 3800 - 3810
[34] Speech Feature Analysis and Spectrum Conversion From Children to Young Adults
Chen, Xueqin
Zhao, Heming
Yu, Yibiao
Wu, Hongwei
2013 NINTH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC), 2013, : 1444 - 1448
[35] Speech Rate Conversion without extraction of accurate pitch period
Shiokawa, A
Yamada, K
Ishii, R
IECON 2000: 26TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, VOLS 1-4: 21ST CENTURY TECHNOLOGIES AND INDUSTRIAL OPPORTUNITIES, 2000, : 161 - 165
[36] Effectiveness of Generative Adversarial Network for Non-Audible Murmur-to-Whisper Speech Conversion
Shah, Neil
Shah, Nirmesh J.
Patil, Hemant A.
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3157 - 3161
[37] DualVoice: A Speech Interaction Method Using Whisper-Voice as Commands
Rekimoto, Jun
EXTENDED ABSTRACTS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2022, 2022,
[38] Improving the harmonic structure of speech spectrum for robust pitch estimation
Chowdhury, Husne Ara
Rahman, Mohammad Shahidur
ACOUSTICAL SCIENCE AND TECHNOLOGY, 2025, 46 (01) : 34 - 37
[39] PRODUCTION OF WHITE TONE FROM WHITE NOISE AND VOICED SPEECH FROM WHISPER
WARREN, RM
BASHFORD, JA
BULLETIN OF THE PSYCHONOMIC SOCIETY, 1978, 11 (05) : 327 - 329
[40] Speech Segmentation in Synthesized Speech Morphing Using Pitch Shifting
Mousa, Allam
INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2011, 8 (02) : 221 - 226

← 1 2 3 4 5 →