Unsupervised language model adaptation based on automatic text collection from WWW

被引:0
|
作者
Suzuki, Motoyuki [1 ]
Kajiura, Yasutomo [1 ]
Ito, Akinori [1 ]
Makino, Shozo [1 ]
机构
[1] Tohoku Univ, Grad Sch Engn, Sendai, Miyagi 980, Japan
关键词
language model adaptation; world wide web; search engine; Google; quary-based sampling;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An n-gram trained by a general corpus gives high performance. However, it is well known that a topic-specialized n-gram gives higher performance than that of the general n-gram. In order to make a topic specialized n-gram, several adaptation methods were proposed. These methods use a given corpus corresponding to the target topic, or collect documents related to the topic from a database. If there is neither the given corpus nor the topic-related documents in the database, the general n-gram cannot be adapted to the topic-specialized n-gram. In this paper, a new unsupervised adaptation method is proposed. The method collects topic-related documents from the world wide web. Several query terms are extracted from recognized text, and collected web pages given by a search engine are used for adaptation. Experimental results showed the proposed method gave 7.2 points higher word accuracy than that given by the general n-gram.
引用
收藏
页码:2202 / 2205
页数:4
相关论文
共 50 条
  • [21] Unsupervised Text Segmentation Based on Native Language Characteristics
    Malmasi, Shervin
    Dras, Mark
    Johnson, Mark
    Du, Lan
    Wolska, Magdalena
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1457 - 1469
  • [22] LANGUAGE MODEL ADAPTATION FOR AUTOMATIC CALL TRANSCRIPTION
    Haznedaroglu, Ali
    Arslan, Levent M.
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [23] UNSUPERVISED CV LANGUAGE MODEL ADAPTATION BASED ON DIRECT LIKELIHOOD MAXIMIZATION SENTENCE SELECTION
    Shinozaki, Takahiro
    Horiuchi, Yasuo
    Kuroiwa, Shingo
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 5029 - 5032
  • [24] Unsupervised language model adaptation via topic modeling based on named entity hypotheses
    Liu, Yang
    Liu, Feifan
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4921 - 4924
  • [25] Retrieval-based language model adaptation for handwritten Chinese text recognition
    Hu, Shuying
    Wang, Qiufeng
    Huang, Kaizhu
    Wen, Min
    Coenen, Frans
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2023, 26 (02) : 109 - 119
  • [26] Retrieval-based language model adaptation for handwritten Chinese text recognition
    Shuying Hu
    Qiufeng Wang
    Kaizhu Huang
    Min Wen
    Frans Coenen
    International Journal on Document Analysis and Recognition (IJDAR), 2023, 26 : 109 - 119
  • [27] Good-turing estimation from word lattices for unsupervised language model adaptation
    Riley, M
    Roark, B
    Sproat, R
    ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 453 - 458
  • [28] Unsupervised Language Model Adaptation for Mandarin Broadcast Conversation Transcription
    Mrva, David
    Woodland, Philip C.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2210 - 2213
  • [29] Unsupervised Language Model Adaptation Using Latent Semantic Marginals
    Tam, Yik-Cheung
    Schultz, Tanja
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2206 - 2209
  • [30] Improving Unsupervised Language Model Adaptation with Discriminative Data Filtering
    Chang, Shuangyu
    Levit, Michael
    Parthasarathy, Partha
    Dumoulin, Benoit
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1207 - 1211