Whisper to normal speech conversion using pitch estimated from spectrum

被引:10
|
作者
Konno, Hideaki [1 ]
Kudo, Mineichi [2 ]
Imai, Hideyuki [2 ]
Sugimoto, Masanori [2 ]
机构
[1] Hokkaido Univ, Hakodate Campus,1-2 Hachiman Cho, Hakodate, Hokkaido 0408567, Japan
[2] Hokkaido Univ, Grad Sch Informat Sci & Technol, Sapporo, Hokkaido 0600814, Japan
关键词
Whispered speech; Pitch; Mel-scaled filter bank; Principal component analysis; Multiple regression analysis; RECONSTRUCTION; RECOGNITION; PROSODY; VOWELS;
D O I
10.1016/j.specom.2016.07.001
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We can perceive pitch in whispered speech, although fundamental frequency (F-0) does not exist physically or phonetically due to the lack of vocal-fold vibration. This study was carried out to determine how people generate such an unvoiced pitch. We conducted experiments in which speakers uttered five whispered Japanese vowels in accordance with the pitch of a guide pure tone. From the results, we derived a multiple regression function to convert the outputs of a mel-scaled filter bank of whispered speech into the perceived pitch value. Next, using this estimated pitch value as F-0, we constructed a system for conversion of whispered speech to normal speech. Since the pitch varies with time according to the spectral shape, it was expected that the pitch accent would be kept by this conversion. Indeed, auditory experiments demonstrated that the correctly perceived rate of Japanese word accent was increased from 55.5% to 72.0% compared with that when a constant F-0 was used. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:10 / 20
页数:11
相关论文
共 50 条
  • [1] Reconstruction of pitch for whisper-to-speech conversion of Chinese
    Li, Jingjie
    McLoughlin, Ian Vince
    Song, Yan
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 206 - 210
  • [2] Whisper to normal speech conversion using deep convolutional neural networks
    Lian, Hailun
    Zhou, Jian
    Hu, Yuting
    Zheng, Wenming
    Shengxue Xuebao/Acta Acustica, 2020, 45 (01): : 137 - 144
  • [3] Whisper to Normal Speech Conversion Using Sequence-to-Sequence Mapping Model With Auditory Attention
    Lian, Hailun
    Hu, Yuting
    Yu, Weiwei
    Zhou, Jian
    Zheng, Wenming
    IEEE ACCESS, 2019, 7 : 130495 - 130504
  • [4] Effectiveness of Cross-Domain Architectures for Whisper-to-Normal Speech Conversion
    Parmar, Mihir
    Doshi, Savan
    Shah, Nirmesh J.
    Patel, Maitreya
    Patil, Havant A.
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [5] WESPER: Zero-shot and Realtime Whisper to Normal Voice Conversion for Whisper-based Speech Interactions
    Rekimoto, Jun
    PROCEEDINGS OF THE 2023 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2023, 2023,
  • [6] Whisper-to-speech conversion using restricted Boltzmann machine arrays
    Li, Jing-jie
    McLoughlin, Ian V.
    Dai, Li-Rong
    Ling, Zhen-hua
    ELECTRONICS LETTERS, 2014, 50 (24) : 1781 - U141
  • [7] Whisper tech turns secrets into normal speech
    Adee, Sally
    NEW SCIENTIST, 2016, 231 (3088) : 21 - 21
  • [8] A Novel Attention-Guided Generative Adversarial Network for Whisper-to-Normal Speech Conversion
    Gao, Teng
    Pan, Qing
    Zhou, Jian
    Wang, Huabin
    Tao, Liang
    Kwan, Hon Keung
    COGNITIVE COMPUTATION, 2023, 15 (02) : 778 - 792
  • [9] A Novel Attention-Guided Generative Adversarial Network for Whisper-to-Normal Speech Conversion
    Teng Gao
    Qing Pan
    Jian Zhou
    Huabin Wang
    Liang Tao
    Hon Keung Kwan
    Cognitive Computation, 2023, 15 : 778 - 792
  • [10] Glottal Flow Synthesis for Whisper-to-Speech Conversion
    Perrotin, Olivier
    McLoughlin, Ian, V
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (28) : 889 - 900