HMM-based audio keyword generation

被引:0
|
作者
Xu, M [1 ]
Duan, LY
Cai, J
Chia, LT
Xu, CS
Tian, Q
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
[2] Inst Infocomm Res, Singapore 119613, Singapore
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the exponential growth in the production creation of multimedia data, there is an increasing need for video semantic analysis. Audio, as a significant part of video, provides important cues to human perception when humans are browsing and understanding video contents. To detect semantic content by useful audio information, we introduce audio keywords which are sets of specific audio sounds related to semantic events. In our previous work, we designed a hierarchical Support Vector Machine (SVM) classifier for audio keyword identification. However, a weakness of our previous work is that audio signals are artificially segmented into 20 ms frames for frame-based SVM identification without any contextual information. In this paper, we propose a classification method based on Hidden Markov Modal (HMM) for audio keyword identification as an improved work instead of using hierarchical SVM classifier. Choosing HMM is motivated by the successful story of HMM in speech recognition. Unlike the frame-based SVM classification followed by major voting, our proposed HMM-based classifiers treat specific sound as a continuous time series data and employ hidden states transition to capture context information. In particular, we study how to find an effective HMM, i.e., determining topology, observation vectors and statistical parameters of HMM. We also compare different HMM structures with different hidden states, and adjust time series data with variable length. Experimental data includes 40 minutes basketball audio which comes from real-time sports games. Experimental results show that, for audio keyword generation, the proposed HMM-based method outperforms the previous hierarchical SVM.
引用
收藏
页码:566 / 574
页数:9
相关论文
共 50 条
  • [1] Scalability issues in an HMM-based audio fingerprinting
    Battle, E
    Masip, J
    Guaus, E
    Cano, P
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 735 - 738
  • [2] Improving HMM-Based Keyword Spotting with Character Language Models
    Fischer, Andreas
    Frinken, Volkmar
    Bunke, Horst
    Suen, Ching Y.
    [J]. 2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 506 - 510
  • [3] HMM-based audio/video mixed data mining algorithm
    Zhang Aijun
    Xu Yun
    Wang Xun
    [J]. ICCSE 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION: ADVANCED COMPUTER TECHNOLOGY, NEW EDUCATION, 2008, : 1217 - 1222
  • [4] Amadeus: A scalable HMM-based audio information retrieval system
    Battle, E
    Masip, J
    Guaus, E
    [J]. ISCCSP : 2004 FIRST INTERNATIONAL SYMPOSIUM ON CONTROL, COMMUNICATIONS AND SIGNAL PROCESSING, 2004, : 731 - 734
  • [5] HMM-based generation of laughter facial expression
    Cakmak, Huseyin
    Dutoit, Thierry
    [J]. SPEECH COMMUNICATION, 2018, 98 : 28 - 41
  • [6] Effect of MPEG Audio Compression on HMM-based Speech Synthesis
    Bollepalli, Bajibabu
    Raitio, Tuomo
    Alku, Paavo
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1061 - 1065
  • [7] SYNCHRONIZATION RULES FOR HMM-BASED AUDIO-VISUAL LAUGHTER SYNTHESIS
    Cakmak, Hueseyin
    Urbain, Jerome
    Dutoit, Thierry
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 2304 - 2308
  • [8] HMM-based transmodal mapping from audio speech to talking faces
    Nakamura, S
    [J]. NEURAL NETWORKS FOR SIGNAL PROCESSING X, VOLS 1 AND 2, PROCEEDINGS, 2000, : 33 - 42
  • [9] Improving Keyword Detection Rate Using a Set of Rules to Merge HMM-based and SVM-based Keyword Spotting Results
    Shokri, Akram
    Davarpour, Mohammad Hossein
    Akbari, Ahmad
    [J]. 2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2014, : 1715 - 1718
  • [10] Fusion of audio and motion information on HMM-based highlight extraction for baseball games
    Cheng, Chih-Chieh
    Hsu, Chiou-Ting
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2006, 8 (03) : 585 - 599