Timestamp-aligning and keyword-biasing end-to-end ASR front-end for a KWS system

被引:5
|
作者
Shi, Gui-Xin [1 ]
Zhang, Wei-Qiang [1 ]
Wang, Guan-Bo [1 ]
Zhao, Jing [1 ]
Chai, Shu-Zhou [1 ]
Zhao, Ze-Yu [1 ]
机构
[1] Tsinghua Univ, Beijing Natl Res Ctr Informat Sci & Technol, Dept Elect Engn, Beijing, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
OpenSAT20; End-to-end ASR; End-to-end KWS; Force alignment; Biased loss; SPEECH RECOGNITION; ENERGY SCORER; SEARCH; ATTENTION;
D O I
10.1186/s13636-021-00212-9
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Many end-to-end approaches have been proposed to detect predefined keywords. For scenarios of multi-keywords, there are still two bottlenecks that need to be resolved: (1) the distribution of important data that contains keyword(s) is sparse, and (2) the timestamps of the detected keywords are inaccurate. In this paper, to alleviate the first issue and further improve the performance of the end-to-end ASR front-end, we propose the biased loss function for guiding the recognizer to pay more attention to the speech segments containing the predefined keywords. As for the second issue, we solve this problem by modifying the force alignment applied to the end-to-end ASR front-end. To get the frame-level alignment, we utilize a Gaussian Mixture Model-Hidden Markov Model (GMM-HMM) based acoustic model (AM) for auxiliary. The proposed system is evaluated in the OpenSAT20 held by the National Institute of Standards and Technology (NIST). The performance of our end-to-end KWS system is comparable to the conventional hybrid KWS system, sometimes even slightly better. With fusion results of the end-to-end and conventional KWS systems, we won the first prize in the KWS track. On the dev dataset (a part of SAFE-T corpus), the system outperforms the baseline by a large margin, i.e., our system with GMM-HMM aligner has a lower segmentation-aware word error rates (relatively 7.9-19.2% decrease) and higher overall Actual term-weighted values (relatively 3.6-11.0% increase), which demonstrates the effectiveness of the proposed method. For more precise alignments, we can use DNN-based AM as alignmentor at the cost of more computation.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Front-end system of the "Luch" facility
    Rukavishnikov, NN
    Savkin, AV
    Sharov, OA
    Sukharev, SA
    Zimalin, BG
    LASER INTERACTION WITH MATTER: MEMORIAL TO ACADEMICIAN, NOBEL LAUREATE NG BASOV, 2003, 5228 : 665 - 672
  • [22] END-TO-END MONAURAL MULTI-SPEAKER ASR SYSTEM WITHOUT PRETRAINING
    Chang, Xuankai
    Qian, Yanmin
    Yu, Kai
    Watanabe, Shinji
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6256 - 6260
  • [23] Knowledge Distillation for End-to-End Monaural Multi-talker ASR System
    Zhang, Wangyou
    Chang, Xuankai
    Qian, Yanmin
    INTERSPEECH 2019, 2019, : 2633 - 2637
  • [24] AN END-TO-END FAR-FIELD KEYWORD SPOTTING SYSTEM WITH NEURAL BEAMFORMING
    Ji, Xuan
    Lu, Lu
    Fang, Fuming
    Ma, Jianbo
    Zhu, Lei
    Li, Jinke
    Zhao, Dongdi
    Liu, Ming
    Jiang, Feijun
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 892 - 899
  • [25] Front-End And Back-End Separation For Warehouse Management System
    Qi Yunrui
    2018 11TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTATION TECHNOLOGY AND AUTOMATION (ICICTA 2018), 2018, : 204 - 208
  • [26] Keyword Search Using Attention-Based End-to-End ASR and Frame-Synchronous Phoneme Alignments
    Yang, Runyan
    Cheng, Gaofeng
    Miao, Haoran
    Li, Ta
    Zhang, Pengyuan
    Yan, Yonghong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3202 - 3215
  • [27] A biasing based current mode analog front-end for biomedical applications
    Krishnamoorthy, Raja
    Kavitha, E.
    Geo, V. Beslin
    Radhika, K. S. R.
    Bharatiraja, C.
    MATERIALS TODAY-PROCEEDINGS, 2021, 45 : 3113 - 3119
  • [28] Enhanced ASR Robustness to Packet Loss with a Front-End Adaptation Network
    Dissen, Yehoshua
    Yonash, Shiry
    Cohen, Israel
    Keshet, Joseph
    INTERSPEECH 2024, 2024, : 5008 - 5012
  • [29] A programmable front-end system for arrays of bolometers
    Alessandrello, A
    Brofferio, C
    Bucci, C
    Cremonesi, O
    Giuliani, A
    Nucciotti, A
    Pavan, M
    Perego, M
    Pessina, G
    Pirro, S
    Previtali, E
    Vanzini, M
    NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTION A-ACCELERATORS SPECTROMETERS DETECTORS AND ASSOCIATED EQUIPMENT, 2000, 444 (1-2): : 111 - 114
  • [30] Front-end readout system for PHENIX RICH
    Tanaka, Y
    Hara, H
    Ebisu, K
    Hibino, M
    Matsumoto, T
    Sakaguchi, T
    Kikuchi, J
    Winterberg, AL
    Walker, JW
    Frank, S
    Moscone, C
    Jones, JP
    Young, GR
    Oyama, K
    Hamagaki, H
    IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2000, 47 (06) : 1995 - 2002