Automatic acoustic segmentation in N-best list rescoring for lecture speech recognition

被引:0
|
作者
Shen, Peng [1 ]
Lu, Xugang [1 ]
Kawai, Hisashi [1 ]
机构
[1] Natl Inst Informat & Commun Technol, Tokyo, Japan
关键词
Acoustic segmentation; acoustic event detection; language model; N-best list rescore;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Speech segmentation is important in automatic speech recognition (ASR) and machine translation (MT). Particularly in N-best list rescoring processing, generalizing N-best lists consisting of as many as candidates from a decoding lattice requires proper utterance segmentation. In lecture speech recognition, only long audio recordings are provided without any utterance segmentation information. In addition, rather than only speech event, other acoustic events, e.g., laugh, applause, etc., are included in the recordings. Traditional speech segmentation algorithms for ASR focus on acoustic cues in segmentation, while in MT, speech text segmentation algorithms pay much attention to linguistic cues. In this study, we propose a three-stage speech segmentation framework by integrating both the acoustic and linguistic cues. We tested the segmentation framework for lecture speech recognition. Our results showed the effectiveness of the proposed segmentation algorithm.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] The ESAT 2008 System for N-Best Dutch Speech Recognition Benchmark
    Demuynck, Kris
    Puurula, Antti
    Van Compernolle, Dirk
    Wambacq, Patrick
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 339 - 344
  • [22] A word graph based N-Best search in continuous speech recognition
    Tran, BH
    Seide, F
    Steinbiss, V
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2127 - 2130
  • [23] Automatic Acoustic Segmentation for Speech Recognition on Broadcast Recordings
    Peng, Gang
    Hwang, Mei-Yuh
    Ostendorf, Mari
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2580 - 2583
  • [24] JOINT N-BEST RESCORING FOR REPEATED UTTERANCES IN SPOKEN DIALOG SYSTEMS
    Bohus, Dan
    Zweig, Geoffrey
    Nguyen, Patrick
    Li, Xiao
    2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 133 - 136
  • [25] An N-Best Candidates-Based Discriminative Training for Speech Recognition Applications
    Chen, Jung-Kuei
    Soong, Frank K.
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (01): : 206 - 216
  • [26] Morpho-syntactic post-processing of N-best lists for improved French automatic speech recognition
    Huet, Stephane
    Gravier, Guillaume
    Sebillot, Pascale
    COMPUTER SPEECH AND LANGUAGE, 2010, 24 (04): : 663 - 684
  • [27] A study on integrating acoustic-phonetic information into lattice rescoring for automatic speech recognition
    Siniscalchi, Sabato Marco
    Lee, Chin-Hui
    SPEECH COMMUNICATION, 2009, 51 (11) : 1139 - 1153
  • [28] SEARCH RESULTS BASED N-BEST HYPOTHESIS RESCORING WITH MAXIMUM ENTROPY CLASSIFICATION
    Peng, Fuchun
    Roy, Scott
    Shahshahani, Ben
    Beaufays, Francoise
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 422 - 427
  • [29] SHoUT, the University of Twente Submission to the N-Best 2008 Speech Recognition Evaluation for Dutch
    Huijbregts, Marijn
    Ordelman, Roeland
    van der Werff, Laurens
    de Jong, Franciska
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2547 - 2550
  • [30] N-best decision for Thai stressed speech recognition with parallel hidden Markov model
    Amornkul, P
    Kumhom, P
    Chamnongthai, K
    ISPACS 2005: PROCEEDINGS OF THE 2005 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS, 2005, : 25 - 28