Automatic acoustic segmentation in N-best list rescoring for lecture speech recognition

被引:0
|
作者
Shen, Peng [1 ]
Lu, Xugang [1 ]
Kawai, Hisashi [1 ]
机构
[1] Natl Inst Informat & Commun Technol, Tokyo, Japan
关键词
Acoustic segmentation; acoustic event detection; language model; N-best list rescore;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Speech segmentation is important in automatic speech recognition (ASR) and machine translation (MT). Particularly in N-best list rescoring processing, generalizing N-best lists consisting of as many as candidates from a decoding lattice requires proper utterance segmentation. In lecture speech recognition, only long audio recordings are provided without any utterance segmentation information. In addition, rather than only speech event, other acoustic events, e.g., laugh, applause, etc., are included in the recordings. Traditional speech segmentation algorithms for ASR focus on acoustic cues in segmentation, while in MT, speech text segmentation algorithms pay much attention to linguistic cues. In this study, we propose a three-stage speech segmentation framework by integrating both the acoustic and linguistic cues. We tested the segmentation framework for lecture speech recognition. Our results showed the effectiveness of the proposed segmentation algorithm.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] A PRUNED RNNLM LATTICE-RESCORING ALGORITHM FOR AUTOMATIC SPEECH RECOGNITION
    Xu, Hainan
    Chen, Tongfei
    Gao, Dongji
    Wang, Yiming
    Li, Ke
    Goel, Nagendra
    Carmiel, Yishay
    Povey, Daniel
    Khudanpur, Sanjeev
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5929 - 5933
  • [42] N-best List Re-ranking Using Semantic Relatedness and Syntactic Score: An Approach for Improving Speech Recognition Accuracy in Air Traffic Control
    Van Nhan Nguyen
    Holone, Harald
    2016 16TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS), 2016, : 1315 - 1319
  • [43] A study on knowledge source integration for candidate rescoring in automatic speech recognition
    Li, J
    Tsao, Y
    Lee, CH
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 837 - 840
  • [44] Improving pronunciation inference using n-best list, acoustics and orthography
    Anumanchipalli, Gopala Krishna
    Ravishankar, Mosur
    Reddy, Raj
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 925 - +
  • [45] N-BEST ENTROPY BASED DATA SELECTION FOR ACOUSTIC MODELING
    Itoh, Nobuyasu
    Sainath, Tara N.
    Liang, Dan Ning
    Zhou, Lie
    Ramabhadran, Bhuvana
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4133 - 4136
  • [46] Maximum relative margin estimation of HMMS based on N-best string models for continuous speech recognition
    Liu, CJ
    Jiang, H
    Rigazio, L
    2005 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2005, : 420 - 425
  • [47] A discriminative training framework using N-best speech recognition transcriptions and scores for spoken utterance classification
    Yaman, Sibel
    Deng, Li
    Yu, Dong
    Wang, Ye-Yi
    Acero, Alex
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 5 - +
  • [48] Acoustic Analysis for Automatic Speech Recognition
    O'Shaughnessy, Douglas
    PROCEEDINGS OF THE IEEE, 2013, 101 (05) : 1038 - 1053
  • [49] Rescoring Teacher Outputs with Decoded Utterances for Knowledge Distillation in Automatic Speech Recognition
    Holen, Henning M.
    Lee, Jee-Hyong
    2020 JOINT 11TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS AND 21ST INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (SCIS-ISIS), 2020, : 302 - 307
  • [50] On the Influence of Automatic Segmentation and Clustering in Automatic Speech Recognition
    Lopez-Otero, Paula
    Docio-Fernandez, Laura
    Garcia-Mateo, Carmen
    Cardenal-Lopez, Antonio
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, 2012, 328 : 49 - 58