N-BEST ENTROPY BASED DATA SELECTION FOR ACOUSTIC MODELING

被引:0
|
作者
Itoh, Nobuyasu [1 ]
Sainath, Tara N. [2 ]
Liang, Dan Ning [3 ]
Zhou, Lie [3 ]
Ramabhadran, Bhuvana [2 ]
机构
[1] IBM Japan Ltd, IBM Res Tokyo, Yamato 2428502, Japan
[2] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
[3] IBM Res Corp, Beijing 100193, Peoples R China
关键词
N-best entropy; Acoustic modeling; Active learning; Data selection; Speech recognition;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a strategy for efficiently selecting informative data from large corpora of untranscribed speech. Confidence-based selection methods (i.e., selecting utterances we are least confident about) have been a popular approach, though they only look at the top hypothesis when selecting utterances and tend to select outliers, therefore, not always improving overall recognition accuracy. Alternatively, we propose a method for selecting data looking at competing hypothesis by computing entropy of N-best hypothesis decoded by the baseline acoustic model. In addition we address the issue of outliers by calculating how representative a specific utterance is to all other unselected utterances via a tf-idf score. Experiments show that N-best entropy based selection (%relative 5.8 in 400-hour corpus) outperformed other conventional selection strategies; confidence based and lattice entropy based, and that tf-idfbased representativeness improved the model further (%relative 6.2). A comparison with random selection is also presented. Finally model size impact is discussed.
引用
收藏
页码:4133 / 4136
页数:4
相关论文
共 50 条
  • [1] SEARCH RESULTS BASED N-BEST HYPOTHESIS RESCORING WITH MAXIMUM ENTROPY CLASSIFICATION
    Peng, Fuchun
    Roy, Scott
    Shahshahani, Ben
    Beaufays, Francoise
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 422 - 427
  • [2] Improved N-Best Extraction with an Evaluation on Language Data
    Bjoerklund, Johanna
    Drewes, Frank
    Jonsson, Anna
    COMPUTATIONAL LINGUISTICS, 2022, 48 (01) : 119 - 153
  • [3] HALLUCINATED N-BEST LISTS FOR DISCRIMINATIVE LANGUAGE MODELING
    Sagae, K.
    Lehr, M.
    Prud'hommeaux, E.
    Xu, P.
    Glenn, N.
    Karakos, D.
    Khudanpur, S.
    Roark, B.
    Saraclar, M.
    Shafran, I.
    Bikel, D.
    Callison-Burch, C.
    Cao, Y.
    Hall, K.
    Hasler, E.
    Koehn, P.
    Lopez, A.
    Post, M.
    Rileyh, D.
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 5001 - 5004
  • [4] Automatic acoustic segmentation in N-best list rescoring for lecture speech recognition
    Shen, Peng
    Lu, Xugang
    Kawai, Hisashi
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [5] DISCRIMINATIVE RECOGNITION RATE ESTIMATION FOR N-BEST LIST AND ITS APPLICATION TO N-BEST RESCORING
    Ogawa, Atsunori
    Hori, Takaaki
    Nakamura, Atsushi
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6832 - 6836
  • [6] Channel Selection Using N-Best Hypothesis for Multi-Microphone ASR
    Wolf, Martin
    Nadeu, Climent
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3474 - 3478
  • [7] Correcting, Rescoring and Matching: An N-best List Selection Framework for Speech Recognition
    Kuo, Chin-Hung
    Chen, Kuan-Yu
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 729 - 734
  • [8] ASR N-BEST FUSION NETS
    Liu, Xinyue
    Li, Mingda
    Chen, Luoxin
    Wanigasekara, Prashan
    Ruan, Weitong
    Khan, Haidar
    Hamza, Wael
    Su, Chengwei
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7618 - 7622
  • [9] Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing
    Cui, Xiaodong
    Saon, George
    Nagano, Tohru
    Suzuki, Masayuki
    Fukuda, Takashi
    Kingsbury, Brian
    Kurata, Gakuto
    INTERSPEECH 2022, 2022, : 2638 - 2642
  • [10] Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing
    Cui, Xiaodong
    Saon, George
    Nagano, Tohru
    Suzuki, Masayuki
    Fukuda, Takashi
    Kingsbury, Brian
    Kurata, Gakuto
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2022, 2022-September : 2638 - 2642