Automatic acoustic segmentation in N-best list rescoring for lecture speech recognition

被引:0
|
作者
Shen, Peng [1 ]
Lu, Xugang [1 ]
Kawai, Hisashi [1 ]
机构
[1] Natl Inst Informat & Commun Technol, Tokyo, Japan
关键词
Acoustic segmentation; acoustic event detection; language model; N-best list rescore;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Speech segmentation is important in automatic speech recognition (ASR) and machine translation (MT). Particularly in N-best list rescoring processing, generalizing N-best lists consisting of as many as candidates from a decoding lattice requires proper utterance segmentation. In lecture speech recognition, only long audio recordings are provided without any utterance segmentation information. In addition, rather than only speech event, other acoustic events, e.g., laugh, applause, etc., are included in the recordings. Traditional speech segmentation algorithms for ASR focus on acoustic cues in segmentation, while in MT, speech text segmentation algorithms pay much attention to linguistic cues. In this study, we propose a three-stage speech segmentation framework by integrating both the acoustic and linguistic cues. We tested the segmentation framework for lecture speech recognition. Our results showed the effectiveness of the proposed segmentation algorithm.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Correcting, Rescoring and Matching: An N-best List Selection Framework for Speech Recognition
    Kuo, Chin-Hung
    Chen, Kuan-Yu
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 729 - 734
  • [2] BERT-based Semantic Model for Rescoring N-best Speech Recognition List
    Fohr, Dominique
    Illina, Irina
    INTERSPEECH 2021, 2021, : 1867 - 1871
  • [3] Semantic Features Based N-Best Rescoring Methods for Automatic Speech Recognition
    Liu, Chang
    Zhang, Pengyuan
    Li, Ta
    Yan, Yonghong
    APPLIED SCIENCES-BASEL, 2019, 9 (23):
  • [4] DISCRIMINATIVE RECOGNITION RATE ESTIMATION FOR N-BEST LIST AND ITS APPLICATION TO N-BEST RESCORING
    Ogawa, Atsunori
    Hori, Takaaki
    Nakamura, Atsushi
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6832 - 6836
  • [5] Multimodal N-best List Rescoring with Weakly Supervised Pre-training in Hybrid Speech Recognition
    Song, Yuanfeng
    Huang, Xiaoling
    Zhao, Xuefang
    Jiang, Di
    Wong, Raymond Chi-Wing
    2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 1336 - 1341
  • [6] Rescoring of N-Best Hypotheses Using Top-Down Selective Attention for Automatic Speech Recognition
    Kim, Ho-Gyeong
    Lee, Hwaran
    Kim, Geonmin
    Oh, Sang-Hoon
    Lee, Soo-Young
    IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (02) : 199 - 203
  • [7] N-best list rescoring using syntactic trigrams
    Salgado-Garza, LR
    Stern, RM
    Nolazco, JA
    MICAI 2004: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2004, 2972 : 79 - 88
  • [8] Improved speech recognition using acoustic and lexical correlates of pitch accent in a N-best rescoring framework
    Ananthakrishnan, Sankaranarayanan
    Narayanan, Shrikanth
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 873 - +
  • [9] Improved Deep Duel Model for Rescoring N-best Speech Recognition List Using Backward LSTMLM and Ensemble Encoders
    Ogawa, Atsunori
    Delcroix, Marc
    Karita, Shigeki
    Nakatani, Tomohiro
    INTERSPEECH 2019, 2019, : 3900 - 3904
  • [10] N-best rescoring for speech recognition using penalized logistic regression machines with garbage class
    Birkenes, Oystein
    Matsui, Tomoko
    Tanabe, Kunio
    Myrvoll, Tor Andre
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 449 - +