At the Border of Acoustics and Linguistics: Bag-of-Audio-Words for the Recognition of Emotions in Speech

被引:99
|
作者
Schmitt, Maximilian [1 ]
Ringeval, Fabien [1 ]
Schuller, Bjoern [1 ,2 ]
机构
[1] Univ Passau, Chair Complex & Intelligent Syst, Passau, Germany
[2] Imperial Coll London, Dept Comp, London, England
基金
欧盟地平线“2020”; 欧盟第七框架计划;
关键词
speech analysis; speech emotion recognition; bag-of-audio-words; computational paralinguistics;
D O I
10.21437/Interspeech.2016-1124
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recognition of natural emotion in speech is a challenging task. Different methods have been proposed to tackle this complex task, such as acoustic feature brute-forcing or even end to-end learning. Recently, bag-of-audio-words (BoAW) representations of acoustic low-level descriptors (LLDs) have been employed successfully in the domain of acoustic event classification and other audio recognition tasks. In this approach, feature vectors of acoustic LLDs are quantised according to a learnt codebook of audio words. Then, a histogram of the occurring `words' is built. Despite their massive potential, BoAW have not been thoroughly studied in emotion recognition. Here, we propose a method using BoAW created only of mel-frequency cepstral coefficients (MFCCs). Support vector regression is then used to predict emotion continuously in time and value, such as in the dimensions arousal and valence. We compare this approach with the computation of functionals based on the MFCCs and perform extensive evaluations on the RECOLA database, which features spontaneous and natural emotions. Results show that, BoAW representation of MFCCs does not only perform significantly better than functionals, but also outperforms by far most of recently published deep learning approaches, including convolutional and recurrent networks.
引用
收藏
页码:495 / 499
页数:5
相关论文
共 18 条
  • [1] Using the Bag-of-Audio-Words approach for emotion recognition
    Vetrab, Mercedes
    Gosztolya, Gabor
    ACTA UNIVERSITATIS SAPIENTIAE INFORMATICA, 2022, 14 (01) : 1 - 21
  • [2] SOFTENING QUANTIZATION IN BAG-OF-AUDIO-WORDS
    Pancoast, Stephanie
    Akbacak, Murat
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [3] Estimating the degree of conflict in speech by employing Bag-of-Audio-Words and Fisher Vectors
    Gosztolya, Gábor
    Expert Systems with Applications, 2022, 205
  • [4] Investigating the Corpus Independence of the Bag-of-Audio-Words Approach
    Vetrab, Mercedes
    Gosztolya, Gabor
    TEXT, SPEECH, AND DIALOGUE (TSD 2020), 2020, 12284 : 285 - 293
  • [5] Bag-of-Audio-Words Approach for Multimedia Event Classification
    Pancoast, Stephanie
    Akbacak, Murat
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2103 - 2106
  • [6] N-GRAM EXTENSION FOR BAG-OF-AUDIO-WORDS
    Pancoast, Stephanie
    Akbacak, Murat
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 778 - 782
  • [7] Ensemble Bag-of-Audio-Words Representation Improves Paralinguistic Classification Accuracy
    Gosztolya, Gabor
    Busa-Fekete, Robert
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 477 - 488
  • [8] Emotion Recognition from Speech Using the Bag-of-Visual Words on Audio Segment Spectrograms
    Spyrou, Evaggelos
    Nikopoulou, Rozalia
    Vernikos, Ioannis
    Mylonas, Phivos
    TECHNOLOGIES, 2019, 7 (01)
  • [9] Bag-of-words Modelling for Speech Recognition
    Ziolko, Bartosz
    Manandhar, Suresh
    Wilson, Richard C.
    INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATIONS, PROCEEDINGS, 2009, : 646 - +
  • [10] Automatic Audio Recognition for Birds, a Bag of Acoustic Words Approach
    Liu, Feng
    Wang, Cai-qun
    2018 INTERNATIONAL CONFERENCE ON ELECTRICAL, CONTROL, AUTOMATION AND ROBOTICS (ECAR 2018), 2018, 307 : 504 - 508