Speech activity detection and automatic prosodic processing unit segmentation for emotion recognition

被引:4
|
作者
Sztaho, David [1 ]
Vicsi, Klara [1 ]
机构
[1] Budapest Univ Technol & Econ, Dept Telecommun & Mediainformat, Magyar Tudosok Korutja 2, H-1117 Budapest, Hungary
来源
关键词
Speech acoustics; speech segmentation; Hidden Markov-Models; speech processing;
D O I
10.3233/IDT-140199
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In speech communication emotions play a great role in expressing information. These emotions are partly given as reactions to our environment, to our partners during a conversation. Understanding these reactions and recognizing them automatically is highly important. Through them, we can get a clearer picture of the response of our partner in a conversation. In Cognitive Info Communication this kind of information helps us to develop robots, devices that are more aware of the need of the user, making the device easy and enjoyable to use. In our laboratory we conducted automatic emotion classification and speech segmentation experiments. In order to develop an automatic emotion recognition system on the basis of speech, an automatic speech segmenter is also needed to separate the speech segments needed for the emotion analysis. In our former research we found that the intonational phrase can be a proper unit of emotion analysis. In this paper speech detection and segmentation methods are developed. For speech detection, Hidden Markov Models are used with various noise and speech acoustic models. The results show that the procedure is able to detect speech in the sound signal with more than 91% accuracy and segment it into intonational phrases.
引用
收藏
页码:315 / 324
页数:10
相关论文
共 50 条
  • [21] Automatic speech recognition and speech activity detection in the CHIL smart room
    Chu, SM
    Marcheret, E
    Potamianos, G
    [J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3869 : 332 - 343
  • [22] The Impact of Face Mask and Emotion on Automatic Speech Recognition (ASR) and Speech Emotion Recognition (SER)
    Oh, Qi Qi
    Seow, Chee Kiat
    Yusuff, Mulliana
    Pranata, Sugiri
    Cao, Qi
    [J]. 2023 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYTICS, ICCCBDA, 2023, : 523 - 531
  • [23] Attention and Feature Selection for Automatic Speech Emotion Recognition Using Utterance and Syllable-Level Prosodic Features
    Starlet Ben Alex
    Leena Mary
    Ben P. Babu
    [J]. Circuits, Systems, and Signal Processing, 2020, 39 : 5681 - 5709
  • [24] Automatic detection of a prosodic hierarchy in a journalistic speech corpus
    Gendrot, Cedric
    Gerdes, Kim
    Adda-Decker, Martine
    [J]. LANGUE FRANCAISE, 2016, (191): : 123 - +
  • [25] Attention and Feature Selection for Automatic Speech Emotion Recognition Using Utterance and Syllable-Level Prosodic Features
    Ben Alex, Starlet
    Mary, Leena
    Babu, Ben P.
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2020, 39 (11) : 5681 - 5709
  • [26] UNIT SEARCH - SPEECH SEGMENTATION AND PROCESSING
    CHRISTOPHE, A
    PALLIER, C
    BERTONCINI, J
    MEHLER, J
    [J]. ANNEE PSYCHOLOGIQUE, 1991, 91 (01): : 59 - 86
  • [27] Intelligibility Rating with Automatic Speech Recognition, Prosodic, and Cepstral Evaluation
    Haderlein, Tino
    Moers, Cornelia
    Moebius, Bernd
    Rosanowski, Frank
    Noeth, Elmar
    [J]. TEXT, SPEECH AND DIALOGUE, TSD 2011, 2011, 6836 : 195 - 202
  • [28] On the Influence of Automatic Segmentation and Clustering in Automatic Speech Recognition
    Lopez-Otero, Paula
    Docio-Fernandez, Laura
    Garcia-Mateo, Carmen
    Cardenal-Lopez, Antonio
    [J]. ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, 2012, 328 : 49 - 58
  • [29] Automatic Emotion Recognition of Speech Signal in Mandarin
    Zhang, Sheng
    Ching, P. C.
    Kong, Fanrang
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1810 - +
  • [30] LEARNING WITH SYNTHESIZED SPEECH FOR AUTOMATIC EMOTION RECOGNITION
    Schuller, Bjoern
    Burkhardt, Felix
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5150 - 5153