Ensemble Bag-of-Audio-Words Representation Improves Paralinguistic Classification Accuracy

被引:3
|
作者
Gosztolya, Gabor [1 ]
Busa-Fekete, Robert [2 ]
机构
[1] Univ Szeged, Hungarian Acad Sci, MTA SZTE Res Grp Artificial Intelligence, H-6720 Szeged, Hungary
[2] Google Inc New York, Google Res, New York, NY 10011 USA
关键词
Task analysis; Feature extraction; Training; Speech processing; Histograms; Databases; Speech coding; Computational paralinguistics; classification; Bag-of-Audio-Words representation; ensemble learning; EMOTION; SUPPORT; STRATEGIES; SPEECH;
D O I
10.1109/TASLP.2020.3044465
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A recently introduced, effective feature extraction technique for computational paralinguistics is that of Bag-of-Audio-Words (BoAW), where we cluster the frame-level training vectors, and represent each speech utterance based on the cluster of its frames. Over the past few years, several improvements have been proposed for the original BoAW approach, but none of them has examined the impact of the stochastic nature of the clustering step. In this study we demonstrate experimentally that the random factor present in the BoAW clustering step is indeed propagated into the next classification step, eventually leading to suboptimal classification performance. As a solution, we propose to train an ensemble of classifiers; that is, we repeat the BoAW codebook selection step several times, train separate classifier models for these BoAW representation versions and combine their predictions. Our results, obtained for three different paralinguistic datasets, demonstrate that this ensemble technique makes the whole paralinguistic classification process more robust, and it leads to improvements in the classification performance. We tested this technique on three different paralinguistic datasets, and achieved the highest Unweighted Average Recall score reported so far on the iHEARu-EAT corpus.
引用
收藏
页码:477 / 488
页数:12
相关论文
共 23 条
  • [1] Bag-of-Audio-Words Approach for Multimedia Event Classification
    Pancoast, Stephanie
    Akbacak, Murat
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2103 - 2106
  • [2] SOFTENING QUANTIZATION IN BAG-OF-AUDIO-WORDS
    Pancoast, Stephanie
    Akbacak, Murat
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [3] Robust Sound Event Classification Using LBP-HOG Based Bag-of-Audio-Words Feature Representation
    Lim, Hyungjun
    Kim, Myung Jong
    Kim, Hoirin
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3325 - 3329
  • [4] Investigating the Corpus Independence of the Bag-of-Audio-Words Approach
    Vetrab, Mercedes
    Gosztolya, Gabor
    [J]. TEXT, SPEECH, AND DIALOGUE (TSD 2020), 2020, 12284 : 285 - 293
  • [5] Using the Bag-of-Audio-Words approach for emotion recognition
    Vetrab, Mercedes
    Gosztolya, Gabor
    [J]. ACTA UNIVERSITATIS SAPIENTIAE INFORMATICA, 2022, 14 (01) : 1 - 21
  • [6] N-GRAM EXTENSION FOR BAG-OF-AUDIO-WORDS
    Pancoast, Stephanie
    Akbacak, Murat
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 778 - 782
  • [7] Using the Bag-of-Audio-Word Feature Representation of ASR DNN Posteriors for Paralinguistic Classification
    Gosztolya, Gabor
    [J]. INTERSPEECH 2019, 2019, : 3940 - 3944
  • [8] At the Border of Acoustics and Linguistics: Bag-of-Audio-Words for the Recognition of Emotions in Speech
    Schmitt, Maximilian
    Ringeval, Fabien
    Schuller, Bjoern
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 495 - 499
  • [9] Estimating the degree of conflict in speech by employing Bag-of-Audio-Words and Fisher Vectors
    Gosztolya, Gábor
    [J]. Expert Systems with Applications, 2022, 205
  • [10] EXPANDED BAG OF WORDS REPRESENTATION FOR OBJECT CLASSIFICATION
    Liu, Tinglin
    Liu, Jing
    Liu, Qinshan
    Lu, Hanqing
    [J]. 2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 297 - 300