Unsupervised language model adaptation based on automatic text collection from WWW

被引:0
|
作者
Suzuki, Motoyuki [1 ]
Kajiura, Yasutomo [1 ]
Ito, Akinori [1 ]
Makino, Shozo [1 ]
机构
[1] Tohoku Univ, Grad Sch Engn, Sendai, Miyagi 980, Japan
关键词
language model adaptation; world wide web; search engine; Google; quary-based sampling;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An n-gram trained by a general corpus gives high performance. However, it is well known that a topic-specialized n-gram gives higher performance than that of the general n-gram. In order to make a topic specialized n-gram, several adaptation methods were proposed. These methods use a given corpus corresponding to the target topic, or collect documents related to the topic from a database. If there is neither the given corpus nor the topic-related documents in the database, the general n-gram cannot be adapted to the topic-specialized n-gram. In this paper, a new unsupervised adaptation method is proposed. The method collects topic-related documents from the world wide web. Several query terms are extracted from recognized text, and collected web pages given by a search engine are used for adaptation. Experimental results showed the proposed method gave 7.2 points higher word accuracy than that given by the general n-gram.
引用
收藏
页码:2202 / 2205
页数:4
相关论文
共 50 条
  • [1] Unsupervised language model adaptation for handwritten Chinese text recognition
    Wang, Qiu-Feng
    Yin, Fei
    Liu, Cheng-Lin
    PATTERN RECOGNITION, 2014, 47 (03) : 1202 - 1216
  • [2] Unsupervised language model adaptation
    Bacchiani, M
    Roark, B
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 224 - 227
  • [3] An unsupervised Web-based topic language model adaptation method
    Lecorve, Gwenole
    Gravier, Guillaume
    Sebillot, Pascale
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 5081 - 5084
  • [4] Supervised and unsupervised Web-based language model domain adaptation
    Lecorve, Gwenole
    Dines, John
    Hain, Thomas
    Motlicek, Petr
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 182 - 185
  • [5] Unsupervised language model adaptation for broadcast news
    Chen, LZ
    Gauvain, JL
    Lamel, L
    Adda, G
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 220 - 223
  • [6] Unsupervised language model adaptation for meeting recognition
    Tur, Gokhan
    Stolcke, Andreas
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 173 - +
  • [7] Research on automatic text classification based on a hybrid language model
    Zheng, De-Quan
    Li, Sheng
    Zhao, Tie-Jun
    Yu, Hao
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2007, 29 (03): : 601 - 605
  • [8] Automatic Query Generation and Query Relevance Measurement for Unsupervised Language Model Adaptation of Speech Recognition
    Ito, Akinori
    Kajiura, Yasutomo
    Suzuki, Motoyuki
    Makino, Shozo
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2009,
  • [9] Automatic Query Generation and Query Relevance Measurement for Unsupervised Language Model Adaptation of Speech Recognition
    Akinori Ito
    Yasutomo Kajiura
    Motoyuki Suzuki
    Shozo Makino
    EURASIP Journal on Audio, Speech, and Music Processing, 2009
  • [10] Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0
    Schlippe, Tim
    Gren, Lukasz
    Vu, Ngoc Thang
    Schultz, Tanja
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2697 - 2701