N-GRAM EXTENSION FOR BAG-OF-AUDIO-WORDS

被引：0

作者：

Pancoast, Stephanie ^{[1
,2
]}

Akbacak, Murat ^{[3
]}

机构：

[1] SRI Int, Speech Technol & Res Lab, Menlo Pk, CA 94025 USA

[2] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA

[3] Microsoft, Sunnyvale, CA USA

来源：

2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2013年

基金：

美国国家科学基金会;

关键词：

Bag-of-audio-words; N-gram models; multimedia event detection;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Bag-of-audio-words is one of the most frequently used methods for incorporating an audio component into multimedia event detection and related tasks. A main criticism of the method, however, is that it ignores context. Each "word" is considered in isolation, ignoring its neighbors. We address this issue by representing the document by its audio word N-grams. Unlike words from natural language, audio words are generated by clustering algorithms where the number of clusters is specified by the researcher. We therefore also explore how the performance of the N-gram representation varies with codebook size. With this enhanced representation, we find the average probability of miss noticeably decreases when evaluated on TRECVID 2011 and 2012 datasets, indicating clear improvements on the multimedia event detection task.

引用

页码：778 / 782

页数：5

共 50 条

[1] SOFTENING QUANTIZATION IN BAG-OF-AUDIO-WORDS
Pancoast, Stephanie
Akbacak, Murat
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[2] Investigating the Corpus Independence of the Bag-of-Audio-Words Approach
Vetrab, Mercedes
Gosztolya, Gabor
TEXT, SPEECH, AND DIALOGUE (TSD 2020), 2020, 12284 : 285 - 293
[3] Bag-of-Audio-Words Approach for Multimedia Event Classification
Pancoast, Stephanie
Akbacak, Murat
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2103 - 2106
[4] Using the Bag-of-Audio-Words approach for emotion recognition
Vetrab, Mercedes
Gosztolya, Gabor
ACTA UNIVERSITATIS SAPIENTIAE INFORMATICA, 2022, 14 (01) : 1 - 21
[5] Sentence Generation from a Bag of Words Using N-gram Model
Yadav, Arun Kumar
Borgohain, Samir Kumar
2014 INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION CONTROL AND COMPUTING TECHNOLOGIES (ICACCCT), 2014, : 1771 - 1776
[6] At the Border of Acoustics and Linguistics: Bag-of-Audio-Words for the Recognition of Emotions in Speech
Schmitt, Maximilian
Ringeval, Fabien
Schuller, Bjoern
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 495 - 499
[7] Ensemble Bag-of-Audio-Words Representation Improves Paralinguistic Classification Accuracy
Gosztolya, Gabor
Busa-Fekete, Robert
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 477 - 488
[8] Estimating the degree of conflict in speech by employing Bag-of-Audio-Words and Fisher Vectors
Gosztolya, Gábor
Expert Systems with Applications, 2022, 205
[9] Bag-Of-Word normalized n-gram models
Sethy, Abhinav
Ramabhadran, Bhuvana
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1594 - 1597
[10] Using Fisher Vector and Bag-of-Audio-Words Representations to Identify Styrian Dialects, Sleepiness, Baby & Orca Sounds
Gosztolya, Gabor
INTERSPEECH 2019, 2019, : 2413 - 2417

← 1 2 3 4 5 →