Investigating the Corpus Independence of the Bag-of-Audio-Words Approach

被引:0
|
作者
Vetrab, Mercedes [1 ,2 ]
Gosztolya, Gabor [1 ,2 ]
机构
[1] Univ Szeged, Inst Informat, Arpad Ter 2, Szeged, Hungary
[2] MTA SZTE Res Grp Artificial Intelligence, Tisza Lajos Korut 103, Szeged, Hungary
来源
关键词
Emotion detection; Bag-of-Audio-words; Human voice; Sound processing;
D O I
10.1007/978-3-030-58323-1_31
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we analyze the general use of the Bag-of-Audio-Words (BoAW) feature extraction method. This technique allows us to handle the problem of varying length recordings. The first step of the BoAW method is to define cluster centers (called codewords) over our feature set with an unsupervised training method (such as k-means clustering or even random sampling). This step is normally performed on the training set of the actual database, but this approach has its own drawbacks: we have to create new codewords for each data set and this increases the computing time and it can lead to over-fitting. Here, we analyse how much the codebook depends on the given corpus. In our experiments, we work with three databases: a Hungarian emotion database, a German emotion database and a general Hungarian speech database. We experiment with constructing a set of codewords on each of these databases, and examine how the classification accuracy scores vary on the Hungarian emotion database. According to our results, the classification performance was similar in each case, which suggests that the Bag-of-Audio-Words codebook is practically corpus-independent. This corpus-independence allows us to reuse codebooks created on different datasets, which can make it easier to use the BoAW method in practice.
引用
收藏
页码:285 / 293
页数:9
相关论文
共 50 条
  • [1] Bag-of-Audio-Words Approach for Multimedia Event Classification
    Pancoast, Stephanie
    Akbacak, Murat
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2103 - 2106
  • [2] Using the Bag-of-Audio-Words approach for emotion recognition
    Vetrab, Mercedes
    Gosztolya, Gabor
    [J]. ACTA UNIVERSITATIS SAPIENTIAE INFORMATICA, 2022, 14 (01) : 1 - 21
  • [3] SOFTENING QUANTIZATION IN BAG-OF-AUDIO-WORDS
    Pancoast, Stephanie
    Akbacak, Murat
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [4] N-GRAM EXTENSION FOR BAG-OF-AUDIO-WORDS
    Pancoast, Stephanie
    Akbacak, Murat
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 778 - 782
  • [5] At the Border of Acoustics and Linguistics: Bag-of-Audio-Words for the Recognition of Emotions in Speech
    Schmitt, Maximilian
    Ringeval, Fabien
    Schuller, Bjoern
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 495 - 499
  • [6] Ensemble Bag-of-Audio-Words Representation Improves Paralinguistic Classification Accuracy
    Gosztolya, Gabor
    Busa-Fekete, Robert
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 477 - 488
  • [7] Estimating the degree of conflict in speech by employing Bag-of-Audio-Words and Fisher Vectors
    Gosztolya, Gábor
    [J]. Expert Systems with Applications, 2022, 205
  • [8] Using Fisher Vector and Bag-of-Audio-Words Representations to Identify Styrian Dialects, Sleepiness, Baby & Orca Sounds
    Gosztolya, Gabor
    [J]. INTERSPEECH 2019, 2019, : 2413 - 2417
  • [9] Robust Sound Event Classification Using LBP-HOG Based Bag-of-Audio-Words Feature Representation
    Lim, Hyungjun
    Kim, Myung Jong
    Kim, Hoirin
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3325 - 3329
  • [10] Automatic Audio Recognition for Birds, a Bag of Acoustic Words Approach
    Liu, Feng
    Wang, Cai-qun
    [J]. 2018 INTERNATIONAL CONFERENCE ON ELECTRICAL, CONTROL, AUTOMATION AND ROBOTICS (ECAR 2018), 2018, 307 : 504 - 508