The Effects of Modulating Fundamental Frequency and Speech Rate on the Intelligibility, Communication Efficiency, and Perceived Naturalness of Synthetic Speech

被引:10
|
作者
Vojtech, Jennifer M. [1 ,2 ]
Noordzij, Jacob P., Jr. [1 ,2 ]
Cler, Gabriel J. [2 ,3 ]
Stepp, Cara E. [1 ,2 ,3 ,4 ]
机构
[1] Boston Univ, Dept Biomed Engn, Boston, MA 02215 USA
[2] Boston Univ, Dept Speech Language & Hearing Sci, Boston, MA 02215 USA
[3] Boston Univ, Grad Program Neurosci Computat Neurosci, Boston, MA 02215 USA
[4] Boston Univ, Sch Med, Dept Otolaryngol Head & Neck Surg, Boston, MA 02215 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
QUALITY-OF-LIFE; SENTENCE INTELLIGIBILITY; PARKINSONS-DISEASE; RATE REDUCTION; SLOW SPEECH; SPEAKERS; IMPACT; SEVERITY; CLEAR; LOUD;
D O I
10.1044/2019_AJSLP-MSC18-18-0052
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Purpose: This study investigated how modulating fundamental frequency (f0) and speech rate differentially impact the naturalness, intelligibility, and communication efficiency of synthetic speech. Method: Sixteen sentences of varying prosodic content were developed via a speech synthesizer. The f0 contour and speech rate of these sentences were altered to produce 4 stimulus sets: (a) normal rate with a fixed f0 level, (b) slow rate with a fixed f0 level, (c) normal rate with prosodically natural f0 variation, and (d) normal rate with prosodically unnatural f0 variation. Sixteen listeners provided orthographic transcriptions and judgments of naturalness for these stimuli. Results: Sentences with f0 variation were rated as more natural than those with a fixed f0 level. Conversely, sentences with a fixed f0 level demonstrated higher intelligibility than those with f0 variation. Speech rate did not affect the intelligibility of stimuli with a fixed f0 level. Communication efficiency was highest for sentences produced at a normal rate and a fixed f0 level. Conclusions: Sentence-level f0 variation increased naturalness ratings of synthesized speech, whether the variation was prosodically natural or not. However, these f0 variations reduced intelligibility. There is evidence of a trade-off in naturalness and intelligibility of synthesized speech, which may impact future speech synthesis designs. Supplemental Material: https://doi.org/10.23641/asha.8847833
引用
收藏
页码:875 / 886
页数:12
相关论文
共 50 条
  • [1] Increasing the Intelligibility and Naturalness of Alaryngeal Speech Using Voice Conversion and Synthetic Fundamental Frequency
    Tuan Dinh
    Kain, Alexander
    Samlan, Robin
    Cao, Beiming
    Wang, Jun
    [J]. INTERSPEECH 2020, 2020, : 4781 - 4785
  • [2] UNDERWATER SPEECH-INTELLIGIBILITY AS A FUNCTION OF FUNDAMENTAL FREQUENCY, SPEECH RATE, AND INTENSITY
    HICKS, JW
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1977, 61 : S6 - S6
  • [3] THE EFFECT OF RATE CONTROL ON THE INTELLIGIBILITY AND NATURALNESS OF DYSARTHRIC SPEECH
    YORKSTON, KM
    HAMMEN, VL
    BEUKELMAN, DR
    TRAYNOR, CD
    [J]. JOURNAL OF SPEECH AND HEARING DISORDERS, 1990, 55 (03): : 550 - 560
  • [4] Effects of frequency shifts on perceived naturalness and gender information in speech
    Assmann, Peter F.
    Dembling, Sophia
    Nearey, Terrance M.
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 889 - +
  • [5] The effect of SpeechEasy on stuttering frequency, speech rate, and speech naturalness
    Armson, Joy
    Kiefte, Michael
    [J]. JOURNAL OF FLUENCY DISORDERS, 2008, 33 (02) : 120 - 134
  • [6] Fundamental frequency and speech intelligibility in background noise
    Brown, Christopher A.
    Bacon, Sid P.
    [J]. HEARING RESEARCH, 2010, 266 (1-2) : 52 - 59
  • [7] Speaking rate and fundamental frequency as speech cues to perceived age
    Hamsberger, James D.
    Shrivastav, Rahul
    Brown, W. S., Jr.
    Rothman, Howard
    Hollien, Harry
    [J]. JOURNAL OF VOICE, 2008, 22 (01) : 58 - 69
  • [8] The effects of fundamental frequency contour manipulations on speech intelligibility in background noise
    Miller, Sharon E.
    Schlauch, Robert S.
    Watson, Peter J.
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2010, 128 (01): : 435 - 443
  • [9] Vocal Age Disguise: The Role of Fundamental Frequency and Speech Rate and Its Perceived Effects
    Waller, Sara Skoog
    Eriksson, Marten
    [J]. FRONTIERS IN PSYCHOLOGY, 2016, 7
  • [10] Effects of Artificial Synthetic Speech Control of SNR and Speech Rate on the Intelligibility of Train Station Announcements
    Maruoka, Mizuki
    Tsujimura, Sohei
    Asakura, Takumi
    [J]. ACOUSTICS AUSTRALIA, 2024, 52 (01) : 77 - 86