On the Impact of Children's Emotional Speech on Acoustic and Language Models

被引:0
|
作者
Stefan Steidl
Anton Batliner
Dino Seppi
Björn Schuller
机构
[1] Friedrich-Alexander-Universität Erlangen-Nürnberg,Lehrstuhl für Mustererkennung
[2] ESAT,Institute for Human
[3] Katholieke Universiteit Leuven,Machine Communication
[4] Technische Universität München,undefined
关键词
Language Model; Automatic Speech Recognition; Acoustic Model; Baseline System; Emotional Speech;
D O I
暂无
中图分类号
学科分类号
摘要
The automatic recognition of children's speech is well known to be a challenge, and so is the influence of affect that is believed to downgrade performance of a speech recogniser. In this contribution, we investigate the combination of both phenomena. Extensive test runs are carried out for 1 k vocabulary continuous speech recognition on spontaneous motherese, emphatic, and angry children's speech as opposed to neutral speech. The experiments address the question how specific emotions influence word accuracy. In a first scenario, "emotional" speech recognisers are compared to a speech recogniser trained on neutral speech only. For this comparison, equal amounts of training data are used for each emotion-related state. In a second scenario, a "neutral" speech recogniser trained on large amounts of neutral speech is adapted by adding only some emotionally coloured data in the training process. The results show that emphatic and angry speech is recognised best—even better than neutral speech—and that the performance can be improved further by adaptation of the acoustic and linguistic models. In order to show the variability of emotional speech, we visualise the distribution of the four emotion-related states in the MFCC space by applying a Sammon transformation.
引用
收藏
相关论文
共 50 条
  • [41] UK speech and language therapists' assessment of children's expressive language, and functional impairment and impact, following the CATALISE publications
    Waine, Hannah
    Bates, Sally
    Frizelle, Pauline
    Oh, Tomasina M. M.
    INTERNATIONAL JOURNAL OF LANGUAGE & COMMUNICATION DISORDERS, 2023, 58 (05) : 1570 - 1587
  • [42] Dependence of the perception of emotional information of speech on the acoustic parameters of the stimulus in children of various ages
    Dmitrieva E.S.
    Gel'man V.Ya.
    Zaitseva K.A.
    Orlov A.M.
    Human Physiology, 2008, 34 (4) : 527 - 531
  • [43] The challenges of evaluation: assessing Early Talk's impact on speech language and communication practice in children's centres
    Jopling, Michael
    Whitmarsh, Judy
    Hadfield, Mark
    INTERNATIONAL JOURNAL OF EARLY YEARS EDUCATION, 2013, 21 (01) : 70 - 84
  • [44] Creating Language and Acoustic Models using Kaldi to Build An Automatic Speech Recognition System for Kannada Language
    Yadava, Thimmaraja G.
    Jayanna, H. S.
    2017 2ND IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRONICS, INFORMATION & COMMUNICATION TECHNOLOGY (RTEICT), 2017, : 161 - 165
  • [45] Impact of the COVID-19 pandemic on speech therapy for children with Speech and Language Disorders
    Hackenberg, Berit
    Buettner, Matthias
    Grosse, Lisa
    Martin, Evgenia
    Cordier, Dahlia
    Matthias, Christoph
    Laessig, Anne Katrin
    LARYNGO-RHINO-OTOLOGIE, 2021,
  • [46] Emotional Speech Recognition of Holocaust Survivors with Deep Neural Network Models for Russian Language
    Bukreeva, Liudmila
    Guseva, Daria
    Dolgushin, Mikhail
    Evdokimova, Vera
    Obotnina, Vasilisa
    SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 68 - 76
  • [47] Acoustic differences in emotional speech of people with dysarthria
    Alhinti, Lubna
    Christensen, Heidi
    Cunningham, Stuart
    SPEECH COMMUNICATION, 2021, 126 : 44 - 60
  • [48] Acoustic Analysis and Automatic Recognition of Spontaneous Children's Speech
    Gerosa, M.
    Giuliani, D.
    Narayanan, S.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1886 - +
  • [49] Using Neutral Speech Models for Emotional Speech Analysis
    Busso, Carlos
    Lee, Sungbok
    Narayanan, Shrikanth S.
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2304 - 2307
  • [50] Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models
    Jorge, Javier
    Gimenez, Adria
    Silvestre-Cerda, Joan Albert
    Civera, Jorge
    Sanchis, Albert
    Juan, Alfons
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 148 - 161