Audio-visual sports highlights extraction using Coupled Hidden Markov Models

被引:0
|
作者
Ziyou Xiong
机构
[1] University of Illinois at Urbana-Champaign,Department of Electrical and Computer Engineering
来源
关键词
State Transition Matrix; Interpretable Model Structure; Sport Video; Golf Swing; Average Classification Accuracy;
D O I
暂无
中图分类号
学科分类号
摘要
We present our studies on the application of Coupled Hidden Markov Models(CHMMs) to sports highlights extraction from broadcast video using both audio and video information. First, we generate audio labels using audio classification via Gaussian mixture models, and video labels using quantization of the average motion vector magnitudes. Then, we model sports highlights using discrete-observations CHMMs on audio and video labels classified from a large training set of broadcast sports highlights. Our experimental results on unseen golf and soccer content show that CHMMs outperform Hidden Markov Models(HMMs) trained on audio-only or video-only observations. Next, we study how the coupling between the two single-modality HMMs offers improvement on modelling capability by making refinements on the states of the models. We also show that the number of states optimized in this fashion also gives better classification results than other number of states. We conclude that CHMMs provide a promising tool for information fusion techniques in the sports domain for audio-visual event detection and analysis.
引用
收藏
页码:62 / 71
页数:9
相关论文
共 50 条
  • [41] AUDIO-VISUAL PROGRAMMING FOR THE PIANO CLASS + INCLUDING LESSON PLAN USING AUDIO-VISUAL MEDIA
    LANCASTER, EL
    [J]. CLAVIER, 1976, 15 (05): : 28 - 33
  • [42] AUDIO-VISUAL EMOTION RECOGNITION WITH BOOSTED COUPLED HMM
    Lu, Kun
    Jia, Yunde
    [J]. 2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 1148 - 1151
  • [43] Information optimization in coupled audio-visual cortical maps
    Kardar, M
    Zee, A
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (25) : 15894 - 15897
  • [44] Speech extraction based on ica and audio-visual coherence
    Sodoyer, D
    Girin, L
    Jutten, C
    Schwartz, JL
    [J]. SEVENTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOL 2, PROCEEDINGS, 2003, : 65 - 68
  • [45] Facial event mining using coupled hidden Markov models
    Ma, LM
    Zhou, Q
    Celenk, M
    Chelberg, D
    [J]. ICIP: 2004 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1- 5, 2004, : 1405 - 1408
  • [46] A HYBRID VISUAL FEATURE EXTRACTION METHOD FOR AUDIO-VISUAL SPEECH RECOGNITION
    Wu, Guanyong
    Zhu, Jie
    Xu, Haihua
    [J]. 2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 1829 - 1832
  • [47] Recognition of visual speech elements using Hidden Markov Models
    Foo, SW
    Dong, L
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2002, PROCEEDING, 2002, 2532 : 607 - 614
  • [48] Visual tracking using interactive factorial hidden Markov models
    Paeng, Jin Wook
    Kwon, Junseok
    [J]. IET SIGNAL PROCESSING, 2021, 15 (06) : 365 - 374
  • [49] Automatic Visual Feature Extraction for Mandarin Audio-Visual Speech Recognition
    Pao, Tsang-Long
    Liao, Wen-Yuan
    Wu, Tsan-Nung
    Lin, Ching-Yi
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 2936 - 2940
  • [50] Does Audio help in deep Audio-Visual Saliency prediction models?
    Agrawal, Ritvik
    Jyoti, Shreyank
    Girmaji, Rohit
    Sivaprasad, Sarath
    Gandhi, Vineet
    [J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, : 48 - 56