Modelling speaker-size discrimination with voiced and unvoiced speech sounds based on the effect of spectral lift

被引:6
|
作者
Matsui, Toshie [1 ]
Irino, Toshio [2 ]
Uemura, Ryo [2 ]
Yamamoto, Kodai [2 ]
Kawahara, Hideki [2 ]
Patterson, Roy D. [3 ]
机构
[1] Toyohashi Univ Technol, Grad Sch Engn, 1-1 Hibarigaoka,Tempaku Cho, Toyohashi, Aichi 4418580, Japan
[2] Wakayama Univ, Fac Syst Engn, Sakaedani 930, Wakayama, Wakayama 6408510, Japan
[3] Univ Cambridge, Dept Physiol Dev & Neurosci, CNBH, Downing St, Cambridge CB2 3EG, England
关键词
Size perception; Auditory model; Voiced and whispered speech; Speech spectrum; Psychometric function; VOCAL-TRACT; TIME-DOMAIN; WORD RECOGNITION; AUDITORY FILTER; PERCEPTION; FREQUENCY; REPRESENTATIONS; INFORMATION; ASSESSMENTS; FORMANTS;
D O I
10.1016/j.specom.2021.10.006
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We can estimate the size of a speaker solely from their speech sounds, regardless of whether the sounds are voiced or unvoiced. In this study, we developed a size perception model based on the computational theory of the stabilised wavelet transform (SWT) to explain a variety of size discrimination data. We also conducted extended experiments to evaluate the effect of spectral lift on speaker size discrimination, from voiced and unvoiced speech sounds. The just noticeable difference (JND) and the point of subjective equality (PSE) for speaker size discrimination were compared between speech sounds with natural and lifted spectra. On average, listeners tended to judge that the lifted speech came from a smaller speaker. The PSE, which indicates the systematic difference in perceived size, shifted by approximately 10% (Exp. 1) for unvoiced speech sounds, and by approximately 5% (Exp. 2) for voiced speech sounds. The JND depended on the spectral lift for unvoiced sounds, but not with voiced sounds. At the same time, it was noted that there were large differences between listeners: some listeners' judgements were affected by the spectral lift, while others were not. We constructed a size discrimination model to explain all of the experimental results with listener dependence for voiced and unvoiced speech sounds. We introduced a weighting function, based on the Size-Shape Image (SSI) in the SWT, which reduces the influence of resolved harmonics caused by the glottal pulse sequence in voiced speech. As a result, the model with the SSI weighting function predicted fairly well the individual listener's data, whether the judgements were affected by the spectral lift or not, and whether the speech sounds were voiced or unvoiced. The optimum choice of one parameter, that is, the spectral compensation coefficient, enabled us to explain the data of all individuals.
引用
收藏
页码:23 / 41
页数:19
相关论文
共 6 条
  • [1] Modelling speaker-size discrimination with voiced and unvoiced speech sounds based on the effect of spectral lift, Speech Communication (vol 136, pg 23, 2022)
    Matsui, Toshie
    Irino, Toshio
    Uemura, Ryo
    Yamamoto, Kodai
    Kawahara, Hideki
    Patterson, Roy D.
    [J]. SPEECH COMMUNICATION, 2023, 147 : 116 - 117
  • [2] The effect of spectral tilt on size discrimination of voiced speech sounds
    Matsui, Toshie
    Irino, Toshio
    Yamamoto, Kodai
    Kawahara, Hideki
    Patterson, Roy D.
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 601 - 605
  • [3] An auditory model of speaker size perception for voiced speech sounds
    Irino, Toshio
    Takimoto, Eri
    Matsui, Toshie
    Patterson, Roy D.
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1153 - 1157
  • [4] Sequential stream segregation of voiced and unvoiced speech sounds based on fundamental frequency
    David, Marion
    Lavandier, Mathieu
    Grimault, Nicolas
    Oxenham, Andrew J.
    [J]. HEARING RESEARCH, 2017, 344 : 235 - 243
  • [5] Clustering based Voiced-Unvoiced-Silence Detection in Speech using Temporal and Spectral Parameters
    Mondal, Sujoy
    Das Barman, Abhirup
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON RESEARCH IN COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (ICRCICN), 2015, : 390 - 394
  • [6] Discrimination and streaming of speech sounds based on differences in interaural and spectral cues
    David, Marion
    Lavandier, Mathieu
    Grimault, Nicolas
    Oxenham, Andrew J.
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 142 (03): : 1674 - 1685