N-GRAM EXTENSION FOR BAG-OF-AUDIO-WORDS

被引:0
|
作者
Pancoast, Stephanie [1 ,2 ]
Akbacak, Murat [3 ]
机构
[1] SRI Int, Speech Technol & Res Lab, Menlo Pk, CA 94025 USA
[2] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
[3] Microsoft, Sunnyvale, CA USA
基金
美国国家科学基金会;
关键词
Bag-of-audio-words; N-gram models; multimedia event detection;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Bag-of-audio-words is one of the most frequently used methods for incorporating an audio component into multimedia event detection and related tasks. A main criticism of the method, however, is that it ignores context. Each "word" is considered in isolation, ignoring its neighbors. We address this issue by representing the document by its audio word N-grams. Unlike words from natural language, audio words are generated by clustering algorithms where the number of clusters is specified by the researcher. We therefore also explore how the performance of the N-gram representation varies with codebook size. With this enhanced representation, we find the average probability of miss noticeably decreases when evaluated on TRECVID 2011 and 2012 datasets, indicating clear improvements on the multimedia event detection task.
引用
收藏
页码:778 / 782
页数:5
相关论文
共 50 条
  • [1] SOFTENING QUANTIZATION IN BAG-OF-AUDIO-WORDS
    Pancoast, Stephanie
    Akbacak, Murat
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [2] Investigating the Corpus Independence of the Bag-of-Audio-Words Approach
    Vetrab, Mercedes
    Gosztolya, Gabor
    [J]. TEXT, SPEECH, AND DIALOGUE (TSD 2020), 2020, 12284 : 285 - 293
  • [3] Bag-of-Audio-Words Approach for Multimedia Event Classification
    Pancoast, Stephanie
    Akbacak, Murat
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2103 - 2106
  • [4] Using the Bag-of-Audio-Words approach for emotion recognition
    Vetrab, Mercedes
    Gosztolya, Gabor
    [J]. ACTA UNIVERSITATIS SAPIENTIAE INFORMATICA, 2022, 14 (01) : 1 - 21
  • [5] Sentence Generation from a Bag of Words Using N-gram Model
    Yadav, Arun Kumar
    Borgohain, Samir Kumar
    [J]. 2014 INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION CONTROL AND COMPUTING TECHNOLOGIES (ICACCCT), 2014, : 1771 - 1776
  • [6] At the Border of Acoustics and Linguistics: Bag-of-Audio-Words for the Recognition of Emotions in Speech
    Schmitt, Maximilian
    Ringeval, Fabien
    Schuller, Bjoern
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 495 - 499
  • [7] Ensemble Bag-of-Audio-Words Representation Improves Paralinguistic Classification Accuracy
    Gosztolya, Gabor
    Busa-Fekete, Robert
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 477 - 488
  • [8] Estimating the degree of conflict in speech by employing Bag-of-Audio-Words and Fisher Vectors
    Gosztolya, Gábor
    [J]. Expert Systems with Applications, 2022, 205
  • [9] Bag-Of-Word normalized n-gram models
    Sethy, Abhinav
    Ramabhadran, Bhuvana
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1594 - 1597
  • [10] Using Fisher Vector and Bag-of-Audio-Words Representations to Identify Styrian Dialects, Sleepiness, Baby & Orca Sounds
    Gosztolya, Gabor
    [J]. INTERSPEECH 2019, 2019, : 2413 - 2417