Whisper-based spoken term detection systems for search on speech ALBAYZIN evaluation challenge

被引:0
|
作者
Tejedor, Javier [1 ]
Toledano, Doroteo T. [2 ]
机构
[1] Univ San Pablo CEU, CEU Univ, Inst Technol, Urbanizac Monteprincipe, Boadilla Del Monte 28668, Spain
[2] Univ Autonoma Madrid, AUDIAS, Elect & Commun Technol Dept, Escuela Politecn Super, Av Francisco Tomas & Valiente 11, Madrid 28049, Spain
关键词
Search on speech; Spoken term detection; Whisper; ALBAYZIN evaluations; DOCUMENT-RETRIEVAL; KEYWORD SEARCH; QUERY;
D O I
10.1186/s13636-024-00334-w
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The vast amount of information stored in audio repositories makes necessary the development of efficient and automatic methods to search on audio content. In that direction, search on speech (SoS) has received much attention in the last decades. To motivate the development of automatic systems, ALBAYZIN evaluations include a search on speech challenge since 2012. This challenge releases several databases that cover different acoustic domains (i.e., spontaneous speech from TV shows, conference talks, parliament sessions, to name a few) aiming to build automatic systems that retrieve a set of terms from those databases. This paper presents a baseline system based on the Whisper automatic speech recognizer for the spoken term detection task in the search on speech challenge held in 2022 within the ALBAYZIN evaluations. This baseline system will be released with this publication and will be given to participants in the upcoming SoS ALBAYZIN evaluation in 2024. Additionally, several analyses based on some term properties (i.e., in-language and foreign terms, and single-word and multi-word terms) are carried out to show the Whisper capability at retrieving terms that convey specific properties. Although the results obtained for some databases are far from being perfect (e.g., for broadcast news domain), this Whisper-based approach has obtained the best results on the challenge databases so far so that it presents a strong baseline system for the upcoming challenge, encouraging participants to improve it.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] Spoken term detection based on improved index structure
    1600, Academy Publisher (08):
  • [42] A Rescoring Method Using Web Search and Word Vectors for Spoken Term Detection
    Tanji, Haruka
    Kojima, Kazunori
    Nanjo, Hiroaki
    Lee, Shi-wook
    Itoh, Yoshiaki
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1163 - 1167
  • [43] Query-by-example spoken term detection based on phonetic posteriorgram Query-by-example spoken term detection based on phonetic posteriorgram
    Song, Beili
    Zhang, Wei-Qiang
    Cai, Meng
    Liu, Jia
    Johnson, Michael T.
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON EDUCATION, MANAGEMENT AND COMPUTING TECHNOLOGY, 2015, 30 : 1255 - 1260
  • [44] Whisper-Island Detection Based on Unsupervised Segmentation With Entropy-Based Speech Feature Processing
    Zhang, Chi
    Hansen, John H. L.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 883 - 894
  • [45] Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models
    Cui, Ziyun
    Lei, Chang
    Wu, Wen
    Duan, Yinan
    Qu, Diyang
    Wu, Ji
    Chen, Runsen
    Zhang, Chao
    INTERSPEECH 2024, 2024, : 2915 - 2919
  • [46] The Vietnamese Speech Recognition Based on Rectified Linear Units Deep Neural Network and Spoken Term Detection System Combination
    Xiong, Shifu
    Guo, Wu
    Liu, Diyuan
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 183 - 186
  • [47] Spoken Term Detection Based on Feature Space Trajectory Information
    Tian Y.-H.
    He Q.-H.
    Zheng R.-W.
    Wei Z.
    Li Y.-X.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2023, 51 (10): : 2915 - 2924
  • [48] POSTERIOR-BASED CONFIDENCE MEASURES FOR SPOKEN TERM DETECTION
    Wang, Dong
    Tejedor, Javier
    Frankel, Joe
    King, Simon
    Colas, Jose
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4889 - +
  • [49] An Analysis of the RNN-Based Spoken Term Detection Training
    Svec, Jan
    Smidl, Lubos
    Psutka, Josef V.
    SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 119 - 129
  • [50] A comparison of phone and grapheme-based spoken term detection
    Wang, Dong
    Frankel, Joe
    Tejedor, Javier
    King, Simon
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4969 - 4972