N-GRAM EXTENSION FOR BAG-OF-AUDIO-WORDS

被引:0
|
作者
Pancoast, Stephanie [1 ,2 ]
Akbacak, Murat [3 ]
机构
[1] SRI Int, Speech Technol & Res Lab, Menlo Pk, CA 94025 USA
[2] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
[3] Microsoft, Sunnyvale, CA USA
基金
美国国家科学基金会;
关键词
Bag-of-audio-words; N-gram models; multimedia event detection;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Bag-of-audio-words is one of the most frequently used methods for incorporating an audio component into multimedia event detection and related tasks. A main criticism of the method, however, is that it ignores context. Each "word" is considered in isolation, ignoring its neighbors. We address this issue by representing the document by its audio word N-grams. Unlike words from natural language, audio words are generated by clustering algorithms where the number of clusters is specified by the researcher. We therefore also explore how the performance of the N-gram representation varies with codebook size. With this enhanced representation, we find the average probability of miss noticeably decreases when evaluated on TRECVID 2011 and 2012 datasets, indicating clear improvements on the multimedia event detection task.
引用
收藏
页码:778 / 782
页数:5
相关论文
共 50 条
  • [21] N-gram模型综述
    尹陈
    吴敏
    [J]. 计算机系统应用, 2018, 27 (10) : 33 - 38
  • [22] N-gram over Context
    Kawamae, Noriaki
    [J]. PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16), 2016, : 1045 - 1055
  • [23] N-gram similarity and distance
    Kondrak, Grzegorz
    [J]. STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2005, 3772 : 115 - 126
  • [24] Short Text Classification Based on Feature Extension Using The N-Gram Model
    Zhang, Xinwei
    Wu, Bin
    [J]. 2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2015, : 710 - 716
  • [25] BIGRAM VS N-GRAM
    HALPIN, P
    [J]. BYTE, 1988, 13 (08): : 26 - 26
  • [26] Recasting the discriminative n-gram model as a pseudo-conventional n-gram model for LVCSR
    Zhou, Zhengyu
    Meng, Helen
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4933 - 4936
  • [27] Augmented-syllabification of n-gram tagger for Indonesian words and named-entities
    Suyanto, Suyanto
    Sunyoto, Andi
    Ismail, Rezza Nafi
    Romadhony, Ade
    Sthevanie, Febryanti
    [J]. HELIYON, 2022, 8 (11)
  • [28] An intelligent extension of the training set for the Persian n-gram language model: an enrichment algorithm
    Motavallian, Rezvan
    Komeily, Masoud
    [J]. ONOMAZEIN, 2023, (61): : 191 - 211
  • [29] Efficient Estimation of Maximum Entropy Language Models with N-gram features: an SRILM extension
    Alumaee, Tanel
    Kurimo, Mikko
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1820 - +
  • [30] N-gram approach for gender prediction
    Reddy, T. Raghunadha
    Vardhan, B. Vishnu
    Reddy, P. Vijayapal
    [J]. 2017 7TH IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2017, : 860 - 865