Using the Bag-of-Audio-Words approach for emotion recognition

被引：0

作者：

Vetrab, Mercedes ^{[1
,2
]}

Gosztolya, Gabor ^{[1
,2
]}

机构：

[1] Univ Szeged, Inst Int 3, Arpad Ter 2, Szeged, Hungary

[2] ELKH SZTE Res Grp Artificial Intelligence, Tisza Lajos Korut 103, Szeged, Hungary

来源：

ACTA UNIVERSITATIS SAPIENTIAE INFORMATICA | 2022年 / 14卷 / 01期

关键词：

of-audio-words; emotion detection; human voice; sound processing; SPEECH; CLASSIFICATION; VECTORS;

D O I：

10.2478/ausi-2022-0001

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The problem of varying length recordings is a well-known issue in paralinguistics. We investigated how to resolve this problem using the bag-of-audio-words feature extraction approach. The steps of this technique involve preprocessing, clustering, quantization and normalization. The bag-of-audio-words technique is competitive in the area of speech emotion recognition, but the method has several parameters that need to be precisely tuned for good efficiency. The main aim of our study was to analyse the effectiveness of bag-of-audio-words method and try to find the best parameter values for emotion recognition. We optimized the parameters one-by-one, but built on the results of each other. We performed the feature extraction, using openSMILE. Next we transformed our features into same-sized vectors with openXBOW, and finally trained and evaluated SVM models with 10-fold-crossvalidation and UAR. In our experiments, we worked with a Hungarian emotion database. According to our results, the emotion classification performance improves with the bag-of-audio-words feature representation. Not all the BoAW parameters have the optimal settings but later we can make clear recommendations on how to set bag-of-audio-words parameters for emotion detection tasks.

引用

页码：1 / 21

页数：21

共 50 条

[31] An Audio Processing Approach using Ensemble Learning for Speech-Emotion Recognition for Children with ASD
Valles, Damian
Matin, Rezwan
[J]. 2021 IEEE WORLD AI IOT CONGRESS (AIIOT), 2021, : 55 - 61
[32] AUDIO-VISUAL EMOTION RECOGNITION USING BOLTZMANN ZIPPERS
Lu, Kun
Jia, Yunde
[J]. 2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 2589 - 2592
[33] Speech Emotion Recognition Using Deep Learning on audio recordings
Suganya, S.
Charles, E. Y. A.
[J]. 2019 19TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER - 2019), 2019,
[34] An Infant Emotion Recognition System Using Visual and Audio Information
Fang, Chiung-Yao
Ma, Chung-Wen
Chiang, Meng-Lin
Chen, Sei-Wang
[J]. 2017 4TH INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND APPLICATIONS (ICIEA), 2017, : 284 - 291
[35] Continuous Music Emotion Recognition Using Selected Audio Features
Chmulik, Michal
Jarina, Roman
Kuba, Michal
Lieskovska, Eva
[J]. 2019 42ND INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2019, : 589 - 592
[36] A multimodal hierarchical approach to speech emotion recognition from audio and text
Singh, Prabhav
Srivastava, Ridam
Rana, K. P. S.
Kumar, Vineet
[J]. KNOWLEDGE-BASED SYSTEMS, 2021, 229
[37] The CASIA Audio Emotion Recognition Method for Audio/Visual Emotion Challenge 2011
Pan, Shifeng
Tao, Jianhua
Li, Ya
[J]. AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PT II, 2011, 6975 : 388 - 395
[38] On Evaluating Sport Event Recognition using Bag-of-Words Model
Vo Dinh Phong
Tran Ngoc Trung
Le Hoai Bac
[J]. 2009 IEEE-RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES: RESEARCH, INNOVATION AND VISION FOR THE FUTURE, 2009, : 172 - 175
[39] An Unsupervised Approach for Traffic Sign Recognition Based on Bag-of-Visual-Words
Supriyanto, Catur
Luthfiarta, Ardytha
Zeniarja, Junta
[J]. PROCEEDINGS OF 2016 8TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND ELECTRICAL ENGINEERING (ICITEE), 2016,
[40] Machine Learning for Hand Gesture Recognition Using Bag-of-words
Benmoussa, Marouane
Mahmoudi, Abdelhak
[J]. 2018 INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND COMPUTER VISION (ISCV2018), 2018,

← 1 2 3 4 5 →