On the Impact of Children's Emotional Speech on Acoustic and Language Models

被引：0

作者：

Stefan Steidl

Anton Batliner

Dino Seppi

Björn Schuller

机构：

[1] Friedrich-Alexander-Universität Erlangen-Nürnberg,Lehrstuhl für Mustererkennung

[2] ESAT,Institute for Human

[3] Katholieke Universiteit Leuven,Machine Communication

[4] Technische Universität München,undefined

来源：

EURASIP Journal on Audio, Speech, and Music Processing | / 2010卷

关键词：

Language Model; Automatic Speech Recognition; Acoustic Model; Baseline System; Emotional Speech;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The automatic recognition of children's speech is well known to be a challenge, and so is the influence of affect that is believed to downgrade performance of a speech recogniser. In this contribution, we investigate the combination of both phenomena. Extensive test runs are carried out for 1 k vocabulary continuous speech recognition on spontaneous motherese, emphatic, and angry children's speech as opposed to neutral speech. The experiments address the question how specific emotions influence word accuracy. In a first scenario, "emotional" speech recognisers are compared to a speech recogniser trained on neutral speech only. For this comparison, equal amounts of training data are used for each emotion-related state. In a second scenario, a "neutral" speech recogniser trained on large amounts of neutral speech is adapted by adding only some emotionally coloured data in the training process. The results show that emphatic and angry speech is recognised best—even better than neutral speech—and that the performance can be improved further by adaptation of the acoustic and linguistic models. In order to show the variability of emotional speech, we visualise the distribution of the four emotion-related states in the MFCC space by applying a Sammon transformation.

引用

共 50 条

[41] UK speech and language therapists' assessment of children's expressive language, and functional impairment and impact, following the CATALISE publications
Waine, Hannah
Bates, Sally
Frizelle, Pauline
Oh, Tomasina M. M.
INTERNATIONAL JOURNAL OF LANGUAGE & COMMUNICATION DISORDERS, 2023, 58 (05) : 1570 - 1587
[42] Dependence of the perception of emotional information of speech on the acoustic parameters of the stimulus in children of various ages
Dmitrieva E.S.
Gel'man V.Ya.
Zaitseva K.A.
Orlov A.M.
Human Physiology, 2008, 34 (4) : 527 - 531
[43] The challenges of evaluation: assessing Early Talk's impact on speech language and communication practice in children's centres
Jopling, Michael
Whitmarsh, Judy
Hadfield, Mark
INTERNATIONAL JOURNAL OF EARLY YEARS EDUCATION, 2013, 21 (01) : 70 - 84
[44] Creating Language and Acoustic Models using Kaldi to Build An Automatic Speech Recognition System for Kannada Language
Yadava, Thimmaraja G.
Jayanna, H. S.
2017 2ND IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRONICS, INFORMATION & COMMUNICATION TECHNOLOGY (RTEICT), 2017, : 161 - 165
[45] Impact of the COVID-19 pandemic on speech therapy for children with Speech and Language Disorders
Hackenberg, Berit
Buettner, Matthias
Grosse, Lisa
Martin, Evgenia
Cordier, Dahlia
Matthias, Christoph
Laessig, Anne Katrin
LARYNGO-RHINO-OTOLOGIE, 2021,
[46] Emotional Speech Recognition of Holocaust Survivors with Deep Neural Network Models for Russian Language
Bukreeva, Liudmila
Guseva, Daria
Dolgushin, Mikhail
Evdokimova, Vera
Obotnina, Vasilisa
SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 68 - 76
[47] Acoustic differences in emotional speech of people with dysarthria
Alhinti, Lubna
Christensen, Heidi
Cunningham, Stuart
SPEECH COMMUNICATION, 2021, 126 : 44 - 60
[48] Acoustic Analysis and Automatic Recognition of Spontaneous Children's Speech
Gerosa, M.
Giuliani, D.
Narayanan, S.
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1886 - +
[49] Using Neutral Speech Models for Emotional Speech Analysis
Busso, Carlos
Lee, Sungbok
Narayanan, Shrikanth S.
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2304 - 2307
[50] Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models
Jorge, Javier
Gimenez, Adria
Silvestre-Cerda, Joan Albert
Civera, Jorge
Sanchis, Albert
Juan, Alfons
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 148 - 161

← 1 2 3 4 5 →