Emotion Recognition from Speech Using the Bag-of-Visual Words on Audio Segment Spectrograms

被引:13
|
作者
Spyrou, Evaggelos [1 ,2 ,3 ]
Nikopoulou, Rozalia [4 ]
Vernikos, Ioannis [2 ]
Mylonas, Phivos [4 ]
机构
[1] Natl Ctr Sci Res Demokritos, Inst Informat & Telecommun, Athens 15341, Greece
[2] Univ Thessaly, Dept Comp Sci, Lamia 38221, Greece
[3] Technol Educ Inst Sterea Ellada, Dept Comp Engn TE, Lamia 34400, Greece
[4] Ionian Univ, Dept Informat, Corfu 49132, Greece
基金
欧盟地平线“2020”;
关键词
emotion recognition; bag-of-visual words; spectrograms; FEATURES; DIRECTIONS;
D O I
10.3390/technologies7010020
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
It is noteworthy nowadays that monitoring and understanding a human's emotional state plays a key role in the current and forthcoming computational technologies. On the other hand, this monitoring and analysis should be as unobtrusive as possible, since in our era the digital world has been smoothly adopted in everyday life activities. In this framework and within the domain of assessing humans' affective state during their educational training, the most popular way to go is to use sensory equipment that would allow their observing without involving any kind of direct contact. Thus, in this work, we focus on human emotion recognition from audio stimuli (i.e., human speech) using a novel approach based on a computer vision inspired methodology, namely the bag-of-visual words method, applied on several audio segment spectrograms. The latter are considered to be the visual representation of the considered audio segment and may be analyzed by exploiting well-known traditional computer vision techniques, such as construction of a visual vocabulary, extraction of speeded-up robust features (SURF) features, quantization into a set of visual words, and image histogram construction. As a last step, support vector machines (SVM) classifiers are trained based on the aforementioned information. Finally, to further generalize the herein proposed approach, we utilize publicly available datasets from several human languages to perform cross-language experiments, both in terms of actor-created and real-life ones.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Using the Bag-of-Audio-Words approach for emotion recognition
    Vetrab, Mercedes
    Gosztolya, Gabor
    [J]. ACTA UNIVERSITATIS SAPIENTIAE INFORMATICA, 2022, 14 (01) : 1 - 21
  • [2] Fractal dimension of bag-of-visual words
    Ribas, Lucas Correia
    Goncalves, Diogo Nunes
    Silva, Jonathan de Andrade
    de Castro, Amaury Antonio, Jr.
    Bruno, Odemir Martinez
    Goncalves, Wesley Nunes
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2019, 22 (01) : 89 - 98
  • [3] Fractal dimension of bag-of-visual words
    Lucas Correia Ribas
    Diogo Nunes Gonçalves
    Jonathan de Andrade Silva
    Amaury Antônio de Castro Jr.
    Odemir Martinez Bruno
    Wesley Nunes Gonçalves
    [J]. Pattern Analysis and Applications, 2019, 22 : 89 - 98
  • [4] Emotion recognition from speech using deep learning on spectrograms
    Li, Xingguang
    Song, Wenjun
    Liang, Zonglin
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (03) : 2791 - 2796
  • [5] Approximate Image Matching using Strings of Bag-of-Visual Words Representation
    Hong Thinh Nguyen
    Barat, Cecile
    Ducottet, Christophe
    [J]. PROCEEDINGS OF THE 2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, THEORY AND APPLICATIONS (VISAPP 2014), VOL 2, 2014, : 345 - 353
  • [6] Performance evaluation of large-scale object recognition system using bag-of-visual words model
    Min-Uk Kim
    Kyoungro Yoon
    [J]. Multimedia Tools and Applications, 2015, 74 : 2499 - 2517
  • [7] Performance evaluation of large-scale object recognition system using bag-of-visual words model
    Kim, Min-Uk
    Yoon, Kyoungro
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (07) : 2499 - 2517
  • [8] Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition
    Seo, Minji
    Kim, Myungho
    [J]. SENSORS, 2020, 20 (19) : 1 - 21
  • [9] Bag-of-Visual Words Based Automatic Image Annotation
    Kebede, Biniyam
    Getahun, Fekade
    [J]. PROCEEDINGS OF THE 2015 12TH IEEE AFRICON INTERNATIONAL CONFERENCE - GREEN INNOVATION FOR AFRICAN RENAISSANCE (AFRICON), 2015,
  • [10] Two Strategies for Bag-of-Visual Words Feature Extraction
    Tsai, Chih-Fong
    [J]. 2018 7TH INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS (IIAI-AAI 2018), 2018, : 970 - 971