Whisper to normal speech conversion using pitch estimated from spectrum

被引:10
|
作者
Konno, Hideaki [1 ]
Kudo, Mineichi [2 ]
Imai, Hideyuki [2 ]
Sugimoto, Masanori [2 ]
机构
[1] Hokkaido Univ, Hakodate Campus,1-2 Hachiman Cho, Hakodate, Hokkaido 0408567, Japan
[2] Hokkaido Univ, Grad Sch Informat Sci & Technol, Sapporo, Hokkaido 0600814, Japan
关键词
Whispered speech; Pitch; Mel-scaled filter bank; Principal component analysis; Multiple regression analysis; RECONSTRUCTION; RECOGNITION; PROSODY; VOWELS;
D O I
10.1016/j.specom.2016.07.001
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We can perceive pitch in whispered speech, although fundamental frequency (F-0) does not exist physically or phonetically due to the lack of vocal-fold vibration. This study was carried out to determine how people generate such an unvoiced pitch. We conducted experiments in which speakers uttered five whispered Japanese vowels in accordance with the pitch of a guide pure tone. From the results, we derived a multiple regression function to convert the outputs of a mel-scaled filter bank of whispered speech into the perceived pitch value. Next, using this estimated pitch value as F-0, we constructed a system for conversion of whispered speech to normal speech. Since the pitch varies with time according to the spectral shape, it was expected that the pitch accent would be kept by this conversion. Indeed, auditory experiments demonstrated that the correctly perceived rate of Japanese word accent was increased from 55.5% to 72.0% compared with that when a constant F-0 was used. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:10 / 20
页数:11
相关论文
共 50 条
  • [41] Whisper to Normal Speech Based on Deep Neural Networks with MCC and F0 Features
    Lian, Hailun
    Hu, Yuting
    Zhou, Jian
    Wang, Huabin
    Tao, Liang
    2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
  • [42] A Novel Model-based Pitch Conversion Method for Mandarin Speech
    Hwang, Hsin-Te
    Chiang, Chen-Yu
    Sung, Po-Yi
    Chen, Sin-Horng
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2611 - 2614
  • [43] Pitch estimation using a modulation model of speech
    Gopalan, K
    2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 786 - 791
  • [44] Speech enhancement using a pitch predictive model
    Buera, Luis
    Droppo, Jasha
    Acero, Alex
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4885 - +
  • [45] Extraction of voiced regions of speech from emotional speech signals using wavelet-pitch method
    Dendukuri L.S.
    Hussain S.J.
    Periodica polytechnica Electrical engineering and computer science, 2021, 65 (03): : 262 - 278
  • [46] Pitch Estimation Using Harmonic Product Spectrum derived from DCT
    Sripriya, N.
    Nagarajan, T.
    2013 IEEE INTERNATIONAL CONFERENCE OF IEEE REGION 10 (TENCON), 2013,
  • [47] Classification of Vocal Intensity Category from Speech using the Wav2vec2 and Whisper Embeddings
    Kodali, Manila
    Kadiri, Sudarsana Reddy
    Alku, Paavo
    INTERSPEECH 2023, 2023, : 4134 - 4138
  • [48] Pitch Estimation and Voicing Classification Using Reconstructed Spectrum from MFCC
    Wu, JianFeng
    Qin, HuiBin
    Hua, YongZhu
    Fan, LingYan
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (02): : 556 - 559
  • [49] Pitch Estimation in Noisy Speech Based on Temporal Accumulation of Spectrum Peaks
    Huang, Feng
    Lee, Tan
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 641 - 644
  • [50] Speaker adaptation of pitch and spectrum for HMM-based speech synthesis
    Tamura, M., 1600, John Wiley and Sons Inc. (35):