Modelling speaker-size discrimination with voiced and unvoiced speech sounds based on the effect of spectral lift

被引：6

作者：

Matsui, Toshie ^{[1
]}

Irino, Toshio ^{[2
]}

Uemura, Ryo ^{[2
]}

Yamamoto, Kodai ^{[2
]}

Kawahara, Hideki ^{[2
]}

Patterson, Roy D. ^{[3
]}

机构：

[1] Toyohashi Univ Technol, Grad Sch Engn, 1-1 Hibarigaoka,Tempaku Cho, Toyohashi, Aichi 4418580, Japan

[2] Wakayama Univ, Fac Syst Engn, Sakaedani 930, Wakayama, Wakayama 6408510, Japan

[3] Univ Cambridge, Dept Physiol Dev & Neurosci, CNBH, Downing St, Cambridge CB2 3EG, England

来源：

SPEECH COMMUNICATION | 2022年 / 136卷

关键词：

Size perception; Auditory model; Voiced and whispered speech; Speech spectrum; Psychometric function; VOCAL-TRACT; TIME-DOMAIN; WORD RECOGNITION; AUDITORY FILTER; PERCEPTION; FREQUENCY; REPRESENTATIONS; INFORMATION; ASSESSMENTS; FORMANTS;

D O I：

10.1016/j.specom.2021.10.006

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We can estimate the size of a speaker solely from their speech sounds, regardless of whether the sounds are voiced or unvoiced. In this study, we developed a size perception model based on the computational theory of the stabilised wavelet transform (SWT) to explain a variety of size discrimination data. We also conducted extended experiments to evaluate the effect of spectral lift on speaker size discrimination, from voiced and unvoiced speech sounds. The just noticeable difference (JND) and the point of subjective equality (PSE) for speaker size discrimination were compared between speech sounds with natural and lifted spectra. On average, listeners tended to judge that the lifted speech came from a smaller speaker. The PSE, which indicates the systematic difference in perceived size, shifted by approximately 10% (Exp. 1) for unvoiced speech sounds, and by approximately 5% (Exp. 2) for voiced speech sounds. The JND depended on the spectral lift for unvoiced sounds, but not with voiced sounds. At the same time, it was noted that there were large differences between listeners: some listeners' judgements were affected by the spectral lift, while others were not. We constructed a size discrimination model to explain all of the experimental results with listener dependence for voiced and unvoiced speech sounds. We introduced a weighting function, based on the Size-Shape Image (SSI) in the SWT, which reduces the influence of resolved harmonics caused by the glottal pulse sequence in voiced speech. As a result, the model with the SSI weighting function predicted fairly well the individual listener's data, whether the judgements were affected by the spectral lift or not, and whether the speech sounds were voiced or unvoiced. The optimum choice of one parameter, that is, the spectral compensation coefficient, enabled us to explain the data of all individuals.

引用

页码：23 / 41

页数：19

共 6 条

[1] Modelling speaker-size discrimination with voiced and unvoiced speech sounds based on the effect of spectral lift, Speech Communication (vol 136, pg 23, 2022)
Matsui, Toshie
Irino, Toshio
Uemura, Ryo
Yamamoto, Kodai
Kawahara, Hideki
Patterson, Roy D.
[J]. SPEECH COMMUNICATION, 2023, 147 : 116 - 117
[2] The effect of spectral tilt on size discrimination of voiced speech sounds
Matsui, Toshie
Irino, Toshio
Yamamoto, Kodai
Kawahara, Hideki
Patterson, Roy D.
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 601 - 605
[3] An auditory model of speaker size perception for voiced speech sounds
Irino, Toshio
Takimoto, Eri
Matsui, Toshie
Patterson, Roy D.
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1153 - 1157
[4] Sequential stream segregation of voiced and unvoiced speech sounds based on fundamental frequency
David, Marion
Lavandier, Mathieu
Grimault, Nicolas
Oxenham, Andrew J.
[J]. HEARING RESEARCH, 2017, 344 : 235 - 243
[5] Clustering based Voiced-Unvoiced-Silence Detection in Speech using Temporal and Spectral Parameters
Mondal, Sujoy
Das Barman, Abhirup
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON RESEARCH IN COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (ICRCICN), 2015, : 390 - 394
[6] Discrimination and streaming of speech sounds based on differences in interaural and spectral cues
David, Marion
Lavandier, Mathieu
Grimault, Nicolas
Oxenham, Andrew J.
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 142 (03): : 1674 - 1685

← 1 →