Emotion Recognition from Speech Using the Bag-of-Visual Words on Audio Segment Spectrograms

被引：13

作者：

Spyrou, Evaggelos ^{[1
,2
,3
]}

Nikopoulou, Rozalia ^{[4
]}

Vernikos, Ioannis ^{[2
]}

Mylonas, Phivos ^{[4
]}

机构：

[1] Natl Ctr Sci Res Demokritos, Inst Informat & Telecommun, Athens 15341, Greece

[2] Univ Thessaly, Dept Comp Sci, Lamia 38221, Greece

[3] Technol Educ Inst Sterea Ellada, Dept Comp Engn TE, Lamia 34400, Greece

[4] Ionian Univ, Dept Informat, Corfu 49132, Greece

来源：

TECHNOLOGIES | 2019年 / 7卷 / 01期

基金：

欧盟地平线“2020”;

关键词：

emotion recognition; bag-of-visual words; spectrograms; FEATURES; DIRECTIONS;

D O I：

10.3390/technologies7010020

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

It is noteworthy nowadays that monitoring and understanding a human's emotional state plays a key role in the current and forthcoming computational technologies. On the other hand, this monitoring and analysis should be as unobtrusive as possible, since in our era the digital world has been smoothly adopted in everyday life activities. In this framework and within the domain of assessing humans' affective state during their educational training, the most popular way to go is to use sensory equipment that would allow their observing without involving any kind of direct contact. Thus, in this work, we focus on human emotion recognition from audio stimuli (i.e., human speech) using a novel approach based on a computer vision inspired methodology, namely the bag-of-visual words method, applied on several audio segment spectrograms. The latter are considered to be the visual representation of the considered audio segment and may be analyzed by exploiting well-known traditional computer vision techniques, such as construction of a visual vocabulary, extraction of speeded-up robust features (SURF) features, quantization into a set of visual words, and image histogram construction. As a last step, support vector machines (SVM) classifiers are trained based on the aforementioned information. Finally, to further generalize the herein proposed approach, we utilize publicly available datasets from several human languages to perform cross-language experiments, both in terms of actor-created and real-life ones.

引用

页数：14

共 50 条

[1] Using the Bag-of-Audio-Words approach for emotion recognition
Vetrab, Mercedes
Gosztolya, Gabor
[J]. ACTA UNIVERSITATIS SAPIENTIAE INFORMATICA, 2022, 14 (01) : 1 - 21
[2] Fractal dimension of bag-of-visual words
Ribas, Lucas Correia
Goncalves, Diogo Nunes
Silva, Jonathan de Andrade
de Castro, Amaury Antonio, Jr.
Bruno, Odemir Martinez
Goncalves, Wesley Nunes
[J]. PATTERN ANALYSIS AND APPLICATIONS, 2019, 22 (01) : 89 - 98
[3] Fractal dimension of bag-of-visual words
Lucas Correia Ribas
Diogo Nunes Gonçalves
Jonathan de Andrade Silva
Amaury Antônio de Castro Jr.
Odemir Martinez Bruno
Wesley Nunes Gonçalves
[J]. Pattern Analysis and Applications, 2019, 22 : 89 - 98
[4] Emotion recognition from speech using deep learning on spectrograms
Li, Xingguang
Song, Wenjun
Liang, Zonglin
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (03) : 2791 - 2796
[5] Approximate Image Matching using Strings of Bag-of-Visual Words Representation
Hong Thinh Nguyen
Barat, Cecile
Ducottet, Christophe
[J]. PROCEEDINGS OF THE 2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, THEORY AND APPLICATIONS (VISAPP 2014), VOL 2, 2014, : 345 - 353
[6] Performance evaluation of large-scale object recognition system using bag-of-visual words model
Min-Uk Kim
Kyoungro Yoon
[J]. Multimedia Tools and Applications, 2015, 74 : 2499 - 2517
[7] Performance evaluation of large-scale object recognition system using bag-of-visual words model
Kim, Min-Uk
Yoon, Kyoungro
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (07) : 2499 - 2517
[8] Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition
Seo, Minji
Kim, Myungho
[J]. SENSORS, 2020, 20 (19) : 1 - 21
[9] Bag-of-Visual Words Based Automatic Image Annotation
Kebede, Biniyam
Getahun, Fekade
[J]. PROCEEDINGS OF THE 2015 12TH IEEE AFRICON INTERNATIONAL CONFERENCE - GREEN INNOVATION FOR AFRICAN RENAISSANCE (AFRICON), 2015,
[10] Two Strategies for Bag-of-Visual Words Feature Extraction
Tsai, Chih-Fong
[J]. 2018 7TH INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS (IIAI-AAI 2018), 2018, : 970 - 971

← 1 2 3 4 5 →