Measuring Speech Recognition With a Matrix Test Using Synthetic Speech

被引：18

作者：

Nuesse, Theresa ^{[1
,2
]}

Wiercinski, Bianca ^{[1
]}

Brand, Thomas ^{[2
,3
]}

Holube, Inga ^{[1
,2
]}

机构：

[1] Jade Univ Appl Sci, Inst Hearing Technol & Audiol, Ofener Str 16-19, D-26121 Oldenburg, Germany

[2] Cluster Excellence Hearing4All, Oldenburg, Germany

[3] Carl von Ossietzky Univ Oldenburg, Med Phys, Oldenburg, Germany

来源：

TRENDS IN HEARING | 2019年 / 23卷

关键词：

speech audiometry; speech reception threshold; Oldenburg sentence test; text-to-speech; synthetic speech; COGNITIVE LOAD; SENTENCE TEST; INTELLIGIBILITY; NOISE;

D O I：

10.1177/2331216519862982

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Speech audiometry is an essential part of audiological diagnostics and clinical measurements. Development times of speech recognition tests are rather long, depending on the size of speech corpus and optimization necessity. The aim of this study was to examine whether this development effort could be reduced by using synthetic speech in speech audiometry, especially in a matrix test for speech recognition. For this purpose, the speech material of the German matrix test was replicated using a preselected commercial system to generate the synthetic speech files. In contrast to the conventional matrix test, no level adjustments or optimization tests were performed while producing the synthetic speech material. Evaluation measurements were conducted by presenting both versions of the German matrix test (with natural or synthetic speech), alternately and at three different signal-to-noise ratios, to 48 young, normal-hearing participants. Psychometric functions were fitted to the empirical data. Speech recognition thresholds were 0.5 dB signal-to-noise ratio higher (worse) for the synthetic speech, while slopes were equal for both speech types. Nevertheless, speech recognition scores were comparable with the literature and the threshold difference lay within the same range as recordings of two different natural speakers. Although no optimization was applied, the synthetic-speech signals led to equivalent recognition of the different test lists and word categories. The outcomes of this study indicate that the application of synthetic speech in speech recognition tests could considerably reduce the development costs and evaluation time. This offers the opportunity to increase the speech corpus for speech recognition tests with acceptable effort.

引用

页数：14

共 50 条

[31] Robust recognition of noisy speech using speech enhancement
Xu, YF
Zhang, JJ
Yao, KS
Cao, ZG
Ma, ZX
2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 734 - 737
[32] Measuring Mandarin Speech Recognition Thresholds Using the Method of Adaptive Tracking
Wang, Yuxia
Lu, Zhaoyu
Yang, Xiaohu
LiU, Chang
JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2019, 62 (06): : 2009 - 2017
[33] Measuring the Randomness of Speech Cues for Emotion Recognition
Susan, Seba
Kaur, Amandeep
2017 TENTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3), 2017, : 78 - 83
[34] Measuring the Accuracy of Automatic Speech Recognition Solutions
Kuhn, Korbinian
Kersken, Verena
Reuter, Benedikt
Egger, Niklas
Zimmermann, Gottfried
ACM TRANSACTIONS ON ACCESSIBLE COMPUTING, 2023, 16 (04)
[35] SynthASR: Unlocking Synthetic Data for Speech Recognition
Fazel, Amin
Yang, Wei
Liu, Yulan
Barra-Chicote, Roberto
Meng, Yixiong
Maas, Roland
Droppo, Jasha
INTERSPEECH 2021, 2021, : 896 - 900
[36] SPEECH RECOGNITION WITH NO SPEECH OR WITH NOISY SPEECH
Krishna, Gautam
Co Tran
Yu, Jianguo
Tewfik, Ahmed H.
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 1090 - 1094
[37] An Emotion Estimation from Human Speech Using Speech Recognition and Speech Synthesize
Kurematsu, Masaki
Ohashi, Marina
Kinosita, Orimi
Hakura, Jun
Fujita, Hamido
NEW TRENDS IN SOFTWARE METHODOLOGIES, TOOLS AND TECHNIQUES, 2008, 182 : 278 - 289
[38] Robust speech recognition using fuzzy matrix quantisation and neural networks
Xydeas, CS
Lin, C
1996 INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY, VOLUMES 1 AND 2 - PROCEEDINGS, 1996, : 432 - 435
[39] FEATURE-EXTRACTION USING A MATRIX COEFFICIENT FILTER FOR SPEECH RECOGNITION
KATAGISHI, K
SINGER, H
AIKAWA, K
SAGAYAMA, S
SPEECH COMMUNICATION, 1993, 13 (3-4) : 297 - 306
[40] Matrix sentence intelligibility prediction using an automatic speech recognition system
Schaedler, Marc Rene
Warzybok, Anna
Hochmuth, Sabine
Kollmeier, Birger
INTERNATIONAL JOURNAL OF AUDIOLOGY, 2015, 54 : 100 - 107

← 1 2 3 4 5 →