Whisper to normal speech conversion using pitch estimated from spectrum

被引:10
|
作者
Konno, Hideaki [1 ]
Kudo, Mineichi [2 ]
Imai, Hideyuki [2 ]
Sugimoto, Masanori [2 ]
机构
[1] Hokkaido Univ, Hakodate Campus,1-2 Hachiman Cho, Hakodate, Hokkaido 0408567, Japan
[2] Hokkaido Univ, Grad Sch Informat Sci & Technol, Sapporo, Hokkaido 0600814, Japan
关键词
Whispered speech; Pitch; Mel-scaled filter bank; Principal component analysis; Multiple regression analysis; RECONSTRUCTION; RECOGNITION; PROSODY; VOWELS;
D O I
10.1016/j.specom.2016.07.001
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We can perceive pitch in whispered speech, although fundamental frequency (F-0) does not exist physically or phonetically due to the lack of vocal-fold vibration. This study was carried out to determine how people generate such an unvoiced pitch. We conducted experiments in which speakers uttered five whispered Japanese vowels in accordance with the pitch of a guide pure tone. From the results, we derived a multiple regression function to convert the outputs of a mel-scaled filter bank of whispered speech into the perceived pitch value. Next, using this estimated pitch value as F-0, we constructed a system for conversion of whispered speech to normal speech. Since the pitch varies with time according to the spectral shape, it was expected that the pitch accent would be kept by this conversion. Indeed, auditory experiments demonstrated that the correctly perceived rate of Japanese word accent was increased from 55.5% to 72.0% compared with that when a constant F-0 was used. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:10 / 20
页数:11
相关论文
共 50 条
  • [21] Whisper Speech Enhancement Using Joint Variational Autoencoder for Improved Speech Recognition
    Agrawal, Vikas
    Kumar, Shashi
    Rath, Shakti P.
    INTERSPEECH 2021, 2021, : 2706 - 2710
  • [22] Speaker Identification for Whispered Speech Using A Training Feature Transformation From Neutral To Whisper
    Fan, Xing
    Hansen, John H. L.
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2436 - 2439
  • [23] Teager Energy Cepstral Coefficients for Classification of Normal vs. Whisper Speech
    Khoria, Kuldeep
    Kamble, Madhu R.
    Patil, Hemant A.
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 371 - 375
  • [24] Whispered Speech to Normal Speech Conversion Using Bidirectional LSTMs with Meta-network
    Yu, WeiWei
    Lian, HaiLun
    Zhou, Jian
    Wang, HuaBin
    Tao, Liang
    2019 2ND IEEE INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP), 2019, : 251 - 255
  • [25] Unified model for voice conversion of speech and singing voice using adaptive pitch constraints
    Fukawa, Shogo
    Nose, Takashi
    Imai, Shuhei
    Ito, Akinori
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2025, 46 (01) : 120 - 123
  • [26] Speaking style conversion from normal to Lombard speech using a glottal vocoder and Bayesian GMMs
    Lopez, Ana Ramirez
    Seshadri, Shreyas
    Juvela, Lauri
    Rasanen, Okko
    Alku, Paavo
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1363 - 1367
  • [27] Transfer Learning Using Whisper for Dysarthric Automatic Speech Recognition
    Rathod, Siddharth
    Charola, Monil
    Patil, Hemant A.
    SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 579 - 589
  • [28] Voicing decision based on phonemes classification and spectral moments for whisper-to-speech conversion
    Ardaillon, Luc
    Bernardoni, Nathalie Henrich
    Perrotin, Olivier
    INTERSPEECH 2022, 2022, : 2253 - 2257
  • [29] On the transformation of the speech spectrum for voice conversion
    Baudoin, G
    Stylianou, Y
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1405 - 1408
  • [30] Non-Parallel Whisper-to-Normal Speaking Style Conversion Using Auxiliary Classifier Variational Autoencoder
    Seki, Shogo
    Kameoka, Hirokazu
    Kaneko, Takuhiro
    Tanaka, Kou
    IEEE ACCESS, 2023, 11 : 44590 - 44599