Influence of Emotional Speech on Continuous Speech Recognition

被引:0
|
作者
Zgank, Andrej [1 ]
Maucec, Mirjam Sepesy [1 ]
机构
[1] Univ Maribor, Fac Elect Engn & Comp Sci, Maribor, Slovenia
关键词
speech recognition; emotional speech; highly inflected language; Human-Computer Interaction; CLASSIFICATION; FEATURES;
D O I
10.1109/elektro49696.2020.9130316
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Emotions are an important part of human communication, but they can present harsh conditions for an automatic continuous speech recognition system. This paper presents an analysis of to which level the emotional speech degrades speech recognition accuracy, when dealing with a highly inflected Slovenian language. Namely, the language characteristics are those that also influence the speech recognition performance, and inflection is one of the most challenging ones. Moreover, Slovenian belongs to the group of under-resourced languages, like other Slavic languages. The speech recognition system was developed with the Slovenian BNSI Broadcast News speech database. The Interface speech database was used for the experiments with the emotional speech. The analysis was carried out with HMM and DNN acoustic models, combined with a 3-gram statistical language model. The results show that emotional speech degrades speech recognition accuracy in the range between 5% and 7% absolutely.
引用
收藏
页数:4
相关论文
共 50 条
  • [41] RECOGNITION OF CONTINUOUS COMPLEX SPEECH BY MACHINES
    LEVINSON, SE
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 87 (01): : 422 - 423
  • [42] Continuous Speech Recognition from ECoG
    Heger, Dominic
    Herff, Christian
    de Pesters, Adriana
    Telaar, Dominic
    Brunner, Peter
    Schalk, Gerwin
    Schultz, Tanja
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1131 - 1135
  • [43] CONTINUOUS SPEECH RECOGNITION OF KAZAKH LANGUAGE
    Mamyrbayev, Orken
    Turdalyuly, Mussa
    Mekebayev, Nurbapa
    Mukhsina, Kuralay
    Keylan, Alimukhan
    BabaAli, Bagher
    Nabieva, Gulnaz
    Duisenbayeva, Aigerim
    Akhmetov, Bekturgan
    [J]. AMCSE 2018 - INTERNATIONAL CONFERENCE ON APPLIED MATHEMATICS, COMPUTATIONAL SCIENCE AND SYSTEMS ENGINEERING, 2019, 24
  • [44] Parallel Scalability in Speech Recognition Inference engines in large vocabulary continuous speech recognition
    You, Kisun
    Chong, Jike
    Yi, Youngmin
    Gonina, Ekaterina
    Hughes, Christopher J.
    Chen, Yen-Kuang
    Sung, Wonyong
    Keutzer, Kurt
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2009, 26 (06) : 124 - 135
  • [45] Influence of features extraction methods in performance of continuous speech recognition for Romanian
    Dumitru, C. O.
    Gavat, Inge
    [J]. 2007 14TH INTERNATIONAL WORKSHOP ON SYSTEMS, SIGNALS, & IMAGE PROCESSING & EURASIP CONFERENCE FOCUSED ON SPEECH & IMAGE PROCESSING, MULTIMEDIA COMMUNICATIONS & SERVICES, 2007, : 40 - 43
  • [46] Topic extraction based on continuous speech recognition in broadcast news speech
    Ohtsuki, K
    Matsuoka, T
    Matsunaga, S
    Furui, S
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2002, E85D (07) : 1138 - 1144
  • [47] Reduced Feature Extraction for Emotional Speech Recognition
    Palo, Hemanta Kumar
    Mohanty, Mihir Narayan
    [J]. 2015 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2015,
  • [48] Towards more reality in the recognition of emotional speech
    Schuller, Bjoern
    Seppi, Dino
    Batliner, Anton
    Maier, Andreas
    Steidl, Stefan
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 941 - +
  • [49] A Cross-Corpus Recognition of Emotional Speech
    Xiao, Zhongzhe
    Wu, Di
    Zhang, Xiaojun
    Tao, Zhi
    [J]. PROCEEDINGS OF 2016 9TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2016, : 42 - 46
  • [50] Towards speech rate independence in large vocabulary continuous speech recognition
    Martinez, F
    Tapias, D
    Alvarez, J
    [J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 725 - 728