Using the Bag-of-Audio-Words approach for emotion recognition

被引:0
|
作者
Vetrab, Mercedes [1 ,2 ]
Gosztolya, Gabor [1 ,2 ]
机构
[1] Univ Szeged, Inst Int 3, Arpad Ter 2, Szeged, Hungary
[2] ELKH SZTE Res Grp Artificial Intelligence, Tisza Lajos Korut 103, Szeged, Hungary
关键词
of-audio-words; emotion detection; human voice; sound processing; SPEECH; CLASSIFICATION; VECTORS;
D O I
10.2478/ausi-2022-0001
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The problem of varying length recordings is a well-known issue in paralinguistics. We investigated how to resolve this problem using the bag-of-audio-words feature extraction approach. The steps of this technique involve preprocessing, clustering, quantization and normalization. The bag-of-audio-words technique is competitive in the area of speech emotion recognition, but the method has several parameters that need to be precisely tuned for good efficiency. The main aim of our study was to analyse the effectiveness of bag-of-audio-words method and try to find the best parameter values for emotion recognition. We optimized the parameters one-by-one, but built on the results of each other. We performed the feature extraction, using openSMILE. Next we transformed our features into same-sized vectors with openXBOW, and finally trained and evaluated SVM models with 10-fold-crossvalidation and UAR. In our experiments, we worked with a Hungarian emotion database. According to our results, the emotion classification performance improves with the bag-of-audio-words feature representation. Not all the BoAW parameters have the optimal settings but later we can make clear recommendations on how to set bag-of-audio-words parameters for emotion detection tasks.
引用
收藏
页码:1 / 21
页数:21
相关论文
共 50 条
  • [1] Investigating the Corpus Independence of the Bag-of-Audio-Words Approach
    Vetrab, Mercedes
    Gosztolya, Gabor
    [J]. TEXT, SPEECH, AND DIALOGUE (TSD 2020), 2020, 12284 : 285 - 293
  • [2] Bag-of-Audio-Words Approach for Multimedia Event Classification
    Pancoast, Stephanie
    Akbacak, Murat
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2103 - 2106
  • [3] SOFTENING QUANTIZATION IN BAG-OF-AUDIO-WORDS
    Pancoast, Stephanie
    Akbacak, Murat
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [4] At the Border of Acoustics and Linguistics: Bag-of-Audio-Words for the Recognition of Emotions in Speech
    Schmitt, Maximilian
    Ringeval, Fabien
    Schuller, Bjoern
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 495 - 499
  • [5] N-GRAM EXTENSION FOR BAG-OF-AUDIO-WORDS
    Pancoast, Stephanie
    Akbacak, Murat
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 778 - 782
  • [6] Emotion Recognition from Speech Using the Bag-of-Visual Words on Audio Segment Spectrograms
    Spyrou, Evaggelos
    Nikopoulou, Rozalia
    Vernikos, Ioannis
    Mylonas, Phivos
    [J]. TECHNOLOGIES, 2019, 7 (01)
  • [7] Ensemble Bag-of-Audio-Words Representation Improves Paralinguistic Classification Accuracy
    Gosztolya, Gabor
    Busa-Fekete, Robert
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 477 - 488
  • [8] Automatic Audio Recognition for Birds, a Bag of Acoustic Words Approach
    Liu, Feng
    Wang, Cai-qun
    [J]. 2018 INTERNATIONAL CONFERENCE ON ELECTRICAL, CONTROL, AUTOMATION AND ROBOTICS (ECAR 2018), 2018, 307 : 504 - 508
  • [9] Estimating the degree of conflict in speech by employing Bag-of-Audio-Words and Fisher Vectors
    Gosztolya, Gábor
    [J]. Expert Systems with Applications, 2022, 205
  • [10] Using Fisher Vector and Bag-of-Audio-Words Representations to Identify Styrian Dialects, Sleepiness, Baby & Orca Sounds
    Gosztolya, Gabor
    [J]. INTERSPEECH 2019, 2019, : 2413 - 2417