HMM-based audio keyword generation

被引:0
|
作者
Xu, M [1 ]
Duan, LY
Cai, J
Chia, LT
Xu, CS
Tian, Q
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
[2] Inst Infocomm Res, Singapore 119613, Singapore
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the exponential growth in the production creation of multimedia data, there is an increasing need for video semantic analysis. Audio, as a significant part of video, provides important cues to human perception when humans are browsing and understanding video contents. To detect semantic content by useful audio information, we introduce audio keywords which are sets of specific audio sounds related to semantic events. In our previous work, we designed a hierarchical Support Vector Machine (SVM) classifier for audio keyword identification. However, a weakness of our previous work is that audio signals are artificially segmented into 20 ms frames for frame-based SVM identification without any contextual information. In this paper, we propose a classification method based on Hidden Markov Modal (HMM) for audio keyword identification as an improved work instead of using hierarchical SVM classifier. Choosing HMM is motivated by the successful story of HMM in speech recognition. Unlike the frame-based SVM classification followed by major voting, our proposed HMM-based classifiers treat specific sound as a continuous time series data and employ hidden states transition to capture context information. In particular, we study how to find an effective HMM, i.e., determining topology, observation vectors and statistical parameters of HMM. We also compare different HMM structures with different hidden states, and adjust time series data with variable length. Experimental data includes 40 minutes basketball audio which comes from real-time sports games. Experimental results show that, for audio keyword generation, the proposed HMM-based method outperforms the previous hierarchical SVM.
引用
收藏
页码:566 / 574
页数:9
相关论文
共 50 条
  • [31] A HMM-BASED METHOD FOR ANOMALY DETECTION
    Wang, Fei
    Zhu, Hongliang
    Tian, Bin
    Xin, Yang
    Niu, Xinxin
    Yang, Yu
    [J]. 2011 4TH IEEE INTERNATIONAL CONFERENCE ON BROADBAND NETWORK AND MULTIMEDIA TECHNOLOGY (4TH IEEE IC-BNMT2011), 2011, : 276 - 280
  • [32] An HMM-based approach to humming transcription
    Shih, HH
    Narayanan, SS
    Kuo, CCJ
    [J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : 337 - 340
  • [33] HMM-based synthesis of creaky voice
    Raitio, Tuomo
    Kane, John
    Drugman, Thomas
    Gobl, Christer
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2315 - +
  • [34] Developing HMM-based recognizers with ESMERALDA
    Fink, GA
    [J]. TEXT, SPEECH AND DIALOGUE, 1999, 1692 : 229 - 234
  • [35] Improvements to HMM-Based Speech Synthesis Based on Parameter Generation with Rich Context Models
    Takamichi, Shinnosuke
    Toda, Tomoki
    Shiga, Yoshinori
    Sakti, Sakriani
    Neubig, Graham
    Nakamura, Satoshi
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 364 - 368
  • [36] Czech HMM-Based Speech Synthesis
    Hanzlicek, Zdenek
    [J]. TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 291 - 298
  • [37] Robustness of HMM-based Speech Synthesis
    Yamagishi, Junichi
    Ling, Zhenhua
    King, Simon
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 581 - 584
  • [38] HMM-based monitoring of packet channels
    Rossi, PS
    Palmieri, F
    Iannello, G
    [J]. HIGH SPEED NETWORKS AND MULTIMEDIA COMMUNICATIONS, PROCEEDINGS, 2004, 3079 : 144 - 154
  • [39] HMM-Based Vietnamese Speech Synthesis
    Trinh Quoc Son
    [J]. 2015 IEEE/ACIS 14TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2015, : 349 - 353
  • [40] PARAMETER GENERATION ALGORITHM CONSIDERING MODULATION SPECTRUM FOR HMM-BASED SPEECH SYNTHESIS
    Takamichi, Shinnosuke
    Toda, Tomoki
    Black, Alan W.
    Nakamura, Satoshi
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4210 - 4214