Partial compensation for coarticulatory vowel nasalization across concatenative and neural text-to-speech

被引:8
|
作者
Zellou, Georgia [1 ]
Cohn, Michelle [1 ]
Block, Aleese [1 ]
机构
[1] Univ Calif Davis, Linguist Dept, Phonet Lab, 469 Kerr Hall,One Shields Ave, Davis, CA 95616 USA
来源
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 2021年 / 149卷 / 05期
基金
美国国家科学基金会;
关键词
NASAL COARTICULATION; PERCEPTION; ENGLISH; HEIGHT;
D O I
10.1121/10.0004989
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This study investigates the perception of coarticulatory vowel nasality generated using different text-to-speech (TTS) methods in American English. Experiment 1 compared concatenative and neural TTS using a 4IAX task, where listeners discriminated between a word pair containing either both oral or nasalized vowels and a word pair containing one oral and one nasalized vowel. Vowels occurred either in identical or alternating consonant contexts across pairs to reveal perceptual sensitivity and compensatory behavior, respectively. For identical contexts, listeners were better at discriminating between oral and nasalized vowels in neural than in concatenative TTS for nasalized same-vowel trials, but better discrimination for concatenative TTS was observed for oral same-vowel trials. Meanwhile, listeners displayed less compensation for coarticulation in neural than in concatenative TTS. To determine whether apparent roboticity of the TTS voice shapes vowel discrimination and compensation patterns, a "roboticized" version of neural TTS was generated (monotonized f0 and addition of an echo), holding phonetic nasality constant; a ratings study (experiment 2) confirmed that the manipulation resulted in different apparent roboticity. Experiment 3 compared the discrimination of unmodified neural TTS and roboticized neural TTS: listeners displayed lower accuracy in identical contexts for roboticized relative to unmodified neural TTS, yet the performances in alternating contexts were similar.
引用
收藏
页码:3424 / 3436
页数:13
相关论文
共 50 条
  • [1] Evaluation of The Concatenative Turkish Text-to-Speech System
    Orhan, Zeynep
    Gormez, Zeliha
    PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 4314 - +
  • [2] Database mining for flexible concatenative text-to-speech
    Eide, Ellen M.
    Fernandez, Raul
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 697 - +
  • [3] Perception of coarticulatory nasalization by speakers of English and Thai: Evidence for partial compensation
    Beddor, PS
    Krakow, RA
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1999, 106 (05): : 2868 - 2887
  • [4] Affective word ratings for concatenative text-to-speech synthesis
    Tsiakoulis, Pirros
    Raptis, Spiros
    Karabetsos, Sotiris
    Chalamandaris, Aimilios
    20TH PAN-HELLENIC CONFERENCE ON INFORMATICS (PCI 2016), 2016,
  • [5] Towards pooled-speaker concatenative text-to-speech
    Eide, Ellen M.
    Picheny, Michael A.
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 73 - 76
  • [6] A framework for a Bangla concatenative text-to-speech synthesis system
    Syed, MR
    Chakrobartty, S
    Bignall, RJ
    Innovations Through Information Technology, Vols 1 and 2, 2004, : 1318 - 1320
  • [7] Articulatory modeling: A possible role in concatenative text-to-speech synthesis
    Sondhi, MM
    PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 73 - 78
  • [8] Quality Preserving Compression of a Concatenative Text-To-Speech Acoustic Database
    Shoham, Tamar
    Malah, David
    Shechtman, Slava
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (03): : 1056 - 1068
  • [9] Text-to-Speech Conversion Using Concatenative Approach for Gujarati Language
    Narvani, Vishal
    Arolkar, Harshal
    SMART TRENDS IN COMPUTING AND COMMUNICATIONS, VOL 5, SMARTCOM 2024, 2024, 949 : 183 - 193
  • [10] Text-To-Speech Intelligibility across Speech Rates
    Syrdal, Ann K.
    Bunnell, H. Timothy
    Hertz, Susan R.
    Mishra, Taniya
    Spiegel, Murray
    Bickley, Corine
    Rekart, Deborah
    Makashay, Matthew J.
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 622 - 625