Temporal Confusion Network for Speech-based Soccer Event Retrieval

被引:0
|
作者
Pham, Nhut M. [1 ]
Vu, Quan H. [1 ]
机构
[1] VNU HCM, Univ Sci, Artificial Intelligence Lab, Ho Chi Minh City, Vietnam
关键词
temporal confusion network; soccer video; event detection; content-based multimedia retrieval; VIDEO; SYSTEM;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper introduces temporal confusion network and its application for speech-based soccer event retrieval, where an event is remarked by the announcer's spoken words. A temporal confusion network is a confusion network in which each cluster is marked with temporal information. Since the purpose of soccer event retrieval is to retrieve only the interesting highlights - not the whole video clip, temporal information is crucial in tracking them. By expanding the indexing model from 1-best transcriptions to temporal confusion networks, recall rates for event retrieval can be improved. Experiments are conducted on the first round of World Cup 2010 and the Vietnamese AFF Suzuki-cup 2008 databases. In the best case, an average improvement of 7.1% recall rate is achieved.
引用
收藏
页码:549 / 553
页数:5
相关论文
共 50 条
  • [1] Bayesian network based soccer video event detection and retrieval
    Sun, XH
    Jin, GY
    Mei, H
    Xu, GY
    [J]. THIRD INTERNATIONAL SYMPOSIUM ON MULTISPECTRAL IMAGE PROCESSING AND PATTERN RECOGNITION, PTS 1 AND 2, 2003, 5286 : 71 - 76
  • [2] Spatial-Temporal Feature Network for Speech-Based Depression Recognition
    Han, Zhuojin
    Shang, Yuanyuan
    Shao, Zhuhong
    Liu, Jingyi
    Guo, Guodong
    Liu, Tie
    Ding, Hui
    Hu, Qiang
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (01) : 308 - 318
  • [3] Speech-Based Annotation and Retrieval of Digital Photographs
    Hazen, Timothy J.
    Sherry, Brennan
    Adler, Mark
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2077 - +
  • [4] User interfaces for speech-based retrieval of lecture recordings
    Hürst, W
    [J]. ED-MEDIA 2004: World Conference on Educational Multimedia, Hypermedia & Telecommunications, Vols. 1-7, 2004, : 4470 - 4477
  • [5] Speech-based services
    Furman, DS
    Cosky, MJ
    Thomson, DL
    O'Brien, SA
    Sumner, EE
    [J]. BELL LABS TECHNICAL JOURNAL, 1999, 4 (02) : 88 - 97
  • [6] Multimodal video search techniques: Late fusion of speech-based retrieval and visual content-based retrieval
    Amir, A
    Iyengar, G
    Lin, CY
    Naphade, M
    Natsev, A
    Neti, C
    Nock, HJ
    Smith, JR
    Tseng, B
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 1048 - 1051
  • [7] Talk, Don't Write: A Study of Direct Speech-Based Image Retrieval
    Sanabria, Ramon
    Waters, Austin
    Baldridge, Jason
    [J]. INTERSPEECH 2021, 2021, : 2976 - 2980
  • [8] Expediting the identification of impaired channels in cochlear implants via analysis of speech-based confusion matrices
    Remus, Jeremiah J.
    Throckmorton, Chandra S.
    Collins, Leslie M.
    [J]. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2007, 54 (12) : 2193 - 2204
  • [9] WHITMAN AND SPEECH-BASED PROSODY
    JARVIS, DR
    [J]. WALT WHITMAN REVIEW, 1981, 27 (02): : 51 - 62
  • [10] Speech-based Class Attendance
    Amri, Umar Faizel
    Hashim, Nik Nur Wahidah Nik
    Hanif, Noor Hazrin Hany Mohamad
    [J]. 6TH INTERNATIONAL CONFERENCE ON MECHATRONICS (ICOM'17), 2017, 260