On the Impact of Children's Emotional Speech on Acoustic and Language Models

被引:0
|
作者
Stefan Steidl
Anton Batliner
Dino Seppi
Björn Schuller
机构
[1] Friedrich-Alexander-Universität Erlangen-Nürnberg,Lehrstuhl für Mustererkennung
[2] ESAT,Institute for Human
[3] Katholieke Universiteit Leuven,Machine Communication
[4] Technische Universität München,undefined
关键词
Language Model; Automatic Speech Recognition; Acoustic Model; Baseline System; Emotional Speech;
D O I
暂无
中图分类号
学科分类号
摘要
The automatic recognition of children's speech is well known to be a challenge, and so is the influence of affect that is believed to downgrade performance of a speech recogniser. In this contribution, we investigate the combination of both phenomena. Extensive test runs are carried out for 1 k vocabulary continuous speech recognition on spontaneous motherese, emphatic, and angry children's speech as opposed to neutral speech. The experiments address the question how specific emotions influence word accuracy. In a first scenario, "emotional" speech recognisers are compared to a speech recogniser trained on neutral speech only. For this comparison, equal amounts of training data are used for each emotion-related state. In a second scenario, a "neutral" speech recogniser trained on large amounts of neutral speech is adapted by adding only some emotionally coloured data in the training process. The results show that emphatic and angry speech is recognised best—even better than neutral speech—and that the performance can be improved further by adaptation of the acoustic and linguistic models. In order to show the variability of emotional speech, we visualise the distribution of the four emotion-related states in the MFCC space by applying a Sammon transformation.
引用
收藏
相关论文
共 50 条
  • [21] TOWARDS AN ASR APPROACH USING ACOUSTIC AND LANGUAGE MODELS FOR SPEECH ENHANCEMENT
    Nayem, Khandokar Md
    Williamson, Donald S.
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7123 - 7127
  • [22] Impact of Excessive Screen Time on Speech and Language in Children
    Raheem, Amreen
    Khan, Sikander Ghayas
    Ahmed, Muhammad
    Alvi, Farrukh Jawad
    Saleem, Khadeeja
    Batool, Sehar
    JOURNAL OF THE LIAQUAT UNIVERSITY OF MEDICAL AND HEALTH SCIENCES, 2023, 22 (03): : 155 - 159
  • [23] A Fast Adaptation Approach for Enhanced Automatic Recognition of Children’s Speech with Mismatched Acoustic Models
    S. Shahnawazuddin
    Rohit Sinha
    Circuits, Systems, and Signal Processing, 2018, 37 : 1098 - 1115
  • [24] A Fast Adaptation Approach for Enhanced Automatic Recognition of Children's Speech with Mismatched Acoustic Models
    Shahnawazuddin, S.
    Sinha, Rohit
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2018, 37 (03) : 1098 - 1115
  • [25] Acoustic variability and automatic recognition of children's speech
    Gerosa, Matteo
    Giuliani, Diego
    Brugnara, Fabio
    SPEECH COMMUNICATION, 2007, 49 (10-11) : 847 - 860
  • [26] A review of the acoustic and linguistic properties of children's speech
    Potamianos, Alexandros
    Narayanan, Shrikanth
    2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2007, : 22 - 25
  • [27] Impact of Screen Time on Children's Development: Cognitive, Language, Physical, and Social and Emotional Domains
    Panjeti-Madan, Vaishnavi N. N.
    Ranganathan, Prakash
    MULTIMODAL TECHNOLOGIES AND INTERACTION, 2023, 7 (05)
  • [28] First Automatic Fongbe Continuous Speech Recognition System: Development of Acoustic Models and Language Models
    LAleye, Frejus A. A.
    Besacier, Laurent
    Ezin, Eugene C.
    Motamed, Cina
    PROCEEDINGS OF THE 2016 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2016, 8 : 477 - 482
  • [29] Children's language, behavior, and emotional problems - Foreword
    Gallagher, TM
    TOPICS IN LANGUAGE DISORDERS, 1999, 19 (02) : VI - VII
  • [30] Emotional Speech Corpus of Croatian Language
    Dropuljic, Branimir
    Chmura, Milosz Thomasz
    Kolak, Antonio
    Petrinovic, Davor
    PROCEEDINGS OF THE 7TH INTERNATIONAL SYMPOSIUM ON IMAGE AND SIGNAL PROCESSING AND ANALYSIS (ISPA 2011), 2011, : 95 - 100