Speech activity detection and automatic prosodic processing unit segmentation for emotion recognition

被引:4
|
作者
Sztaho, David [1 ]
Vicsi, Klara [1 ]
机构
[1] Budapest Univ Technol & Econ, Dept Telecommun & Mediainformat, Magyar Tudosok Korutja 2, H-1117 Budapest, Hungary
来源
关键词
Speech acoustics; speech segmentation; Hidden Markov-Models; speech processing;
D O I
10.3233/IDT-140199
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In speech communication emotions play a great role in expressing information. These emotions are partly given as reactions to our environment, to our partners during a conversation. Understanding these reactions and recognizing them automatically is highly important. Through them, we can get a clearer picture of the response of our partner in a conversation. In Cognitive Info Communication this kind of information helps us to develop robots, devices that are more aware of the need of the user, making the device easy and enjoyable to use. In our laboratory we conducted automatic emotion classification and speech segmentation experiments. In order to develop an automatic emotion recognition system on the basis of speech, an automatic speech segmenter is also needed to separate the speech segments needed for the emotion analysis. In our former research we found that the intonational phrase can be a proper unit of emotion analysis. In this paper speech detection and segmentation methods are developed. For speech detection, Hidden Markov Models are used with various noise and speech acoustic models. The results show that the procedure is able to detect speech in the sound signal with more than 91% accuracy and segment it into intonational phrases.
引用
收藏
页码:315 / 324
页数:10
相关论文
共 50 条
  • [31] Emotion recognition from speech using global and local prosodic features
    Rao K.S.
    Koolagudi S.G.
    Vempada R.R.
    [J]. International Journal of Speech Technology, 2013, 16 (2) : 143 - 160
  • [32] Emotion recognition from speech using source, system, and prosodic features
    Koolagudi, Shashidhar G.
    Rao, K. Sreenivasa
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2012, 15 (02) : 265 - 289
  • [33] PERFORMANCE ANALYSIS OF SPECTRAL AND PROSODIC FEATURES AND THEIR FUSION FOR EMOTION RECOGNITION IN SPEECH
    Gaurav, Manish
    [J]. 2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 313 - 316
  • [34] A Hybrid Speech Emotion Recognition System Based on Spectral and Prosodic Features
    Zhou, Yu
    Li, Junfeng
    Sun, Yanqing
    Zhang, Jianping
    Yan, Yonghong
    Akagi, Masato
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (10) : 2813 - 2821
  • [35] Study of prosodic feature extraction for multidialectal Odia speech emotion recognition
    Swain, Monorama
    Routray, Aurobinda
    Kabisatpathy, P.
    Kundu, Jogendra N.
    [J]. PROCEEDINGS OF THE 2016 IEEE REGION 10 CONFERENCE (TENCON), 2016, : 1644 - 1649
  • [36] Improving Speech Emotion Recognition System Using Spectral and Prosodic Features
    Chakhtouna, Adil
    Sekkate, Sara
    Adib, Abdellah
    [J]. INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, ISDA 2021, 2022, 418 : 399 - 409
  • [37] Emotion recognition from speech using source, system, and prosodic features
    Shashidhar G. Koolagudi
    K. Sreenivasa Rao
    [J]. International Journal of Speech Technology, 2012, 15 (2) : 265 - 289
  • [38] On the Correlation and Transferability of Features between Automatic Speech Recognition and Speech Emotion Recognition
    Fayek, Haytham M.
    Lech, Margaret
    Cavedon, Lawrence
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3618 - 3622
  • [39] Research on Speech Processing and Automatic Recognition of Children with Language Disorders and their Emotion Analysis System
    Qiang, Yali
    [J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 127 : 33 - 33
  • [40] Automatic labeling inconsistencies detection and correction for sentence unit segmentation in conversational speech
    Cuendet, Sebastien
    Hakkani-Tuer, Dilek
    Shriberg, Elizabeth
    [J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2008, 4892 : 144 - 155