Ensemble Bag-of-Audio-Words Representation Improves Paralinguistic Classification Accuracy

被引：3

作者：

Gosztolya, Gabor ^{[1
]}

Busa-Fekete, Robert ^{[2
]}

机构：

[1] Univ Szeged, Hungarian Acad Sci, MTA SZTE Res Grp Artificial Intelligence, H-6720 Szeged, Hungary

[2] Google Inc New York, Google Res, New York, NY 10011 USA

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2021年 / 29卷

关键词：

Task analysis; Feature extraction; Training; Speech processing; Histograms; Databases; Speech coding; Computational paralinguistics; classification; Bag-of-Audio-Words representation; ensemble learning; EMOTION; SUPPORT; STRATEGIES; SPEECH;

D O I：

10.1109/TASLP.2020.3044465

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

A recently introduced, effective feature extraction technique for computational paralinguistics is that of Bag-of-Audio-Words (BoAW), where we cluster the frame-level training vectors, and represent each speech utterance based on the cluster of its frames. Over the past few years, several improvements have been proposed for the original BoAW approach, but none of them has examined the impact of the stochastic nature of the clustering step. In this study we demonstrate experimentally that the random factor present in the BoAW clustering step is indeed propagated into the next classification step, eventually leading to suboptimal classification performance. As a solution, we propose to train an ensemble of classifiers; that is, we repeat the BoAW codebook selection step several times, train separate classifier models for these BoAW representation versions and combine their predictions. Our results, obtained for three different paralinguistic datasets, demonstrate that this ensemble technique makes the whole paralinguistic classification process more robust, and it leads to improvements in the classification performance. We tested this technique on three different paralinguistic datasets, and achieved the highest Unweighted Average Recall score reported so far on the iHEARu-EAT corpus.

引用

页码：477 / 488

页数：12

共 23 条

[21] LOCAL SPATIAL INFORMATION WITH BAG-OF-VISUAL-WORDS MODEL VIA GRAPH-BASED REPRESENTATION FOR TEXTURE CLASSIFICATION
Thewsuwan, Srisupang
Horio, Keiichi
[J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2020, 16 (05): : 1611 - 1621
[22] Color-Boosted Saliency-Guided Rotation Invariant Bag of Visual Words Representation with Parameter Transfer for Cross-Domain Scene-Level Classification
Yan, Li
Zhu, Ruixi
Liu, Yi
Mo, Nan
[J]. REMOTE SENSING, 2018, 10 (04)
[23] Can using a pre-trained deep learning model as the feature extractor in the bag-of-deep-visual-words model always improve image classification accuracy?
Xu, Ye
Zhang, Xin
Huang, Chongpeng
Qiu, Xiaorong
[J]. PLOS ONE, 2024, 19 (02):

← 1 2 3 →