Supervised and unsupervised Web-based language model domain adaptation

被引:0
|
作者
Lecorve, Gwenole [1 ]
Dines, John [1 ]
Hain, Thomas
Motlicek, Petr [1 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
关键词
Language model; domain adaptation; supervision; Web data;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Domain language model adaptation consists in re-estimating probabilities of a baseline LM in order to better match the specifics of a given broad topic of interest. To do so, a common strategy is to retrieve adaptation texts from the Web based on a given domain-representative seed text. In this paper, we study how the selection of this seed text influences the adaptation process and the performances of resulting adapted language models in automatic speech recognition. More precisely, the goal of this original study is to analyze the differences of our Web-based adaptation approach between the supervised case, in which the seed text is manually generated, and the unsupervised case, where the seed text is given by an automatic transcript. Experiments were carried out on data sourced from a real-world use case, more specifically, videos produced for a university YouTube channel. Results show that our approach is quite robust since the unsupervised adaptation provides similar performance to the supervised case in terms of the overall perplexity and word error rate.
引用
收藏
页码:182 / 185
页数:4
相关论文
共 50 条
  • [1] An unsupervised Web-based topic language model adaptation method
    Lecorve, Gwenole
    Gravier, Guillaume
    Sebillot, Pascale
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 5081 - 5084
  • [2] Web-based Language Model Domain Adaptation for Real World Voice Retrieval
    Chen, Mengzhe
    Zhang, Qingqing
    Wang, Zhichao
    Pan, Jielin
    Yan, Yonghong
    [J]. 2013 9TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2013, : 100 - 104
  • [3] A Domain Specific Language for Web-based GIS
    Alvarado, Suilen H.
    Cortinas, Alejandro
    Luaces, Miguel R.
    Pedreira, Oscar
    Places, Angeles S.
    [J]. WEBIST: PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES, 2019, : 462 - 469
  • [4] Unsupervised language model adaptation
    Bacchiani, M
    Roark, B
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 224 - 227
  • [5] Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model
    Li, Juntao
    He, Ruidan
    Ye, Hai
    Ng, Hwee Tou
    Bing, Lidong
    Yan, Rui
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3672 - 3678
  • [6] On Web-based Domain-Specific Language for Internet of Things
    Sneps-Sneppe, Manfred
    Namiot, Dmitry
    [J]. 2015 7TH INTERNATIONAL CONGRESS ON ULTRA MODERN TELECOMMUNICATIONS AND CONTROL SYSTEMS AND WORKSHOPS (ICUMT), 2015, : 287 - 292
  • [7] Unsupervised Web-based Automatic Annotation
    Millan, Miquel
    Sanchez, David
    Moreno, Antonio
    [J]. STAIRS 2008, 2008, 179 : 118 - 129
  • [8] MODEL UNCERTAINTY FOR UNSUPERVISED DOMAIN ADAPTATION
    Lee, JoonHo
    Lee, Gyemin
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1841 - 1845
  • [9] Supervised Contrastive Learning-Based Unsupervised Domain Adaptation for Hyperspectral Image Classification
    Li, Zhaokui
    Xu, Qiang
    Ma, Li
    Fang, Zhuoqun
    Wang, Yan
    He, Wenqiang
    Du, Qian
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [10] Web-Based Supervised Thematic Mapping
    Silva, Javier Lozano
    Bengoa, Naiara Aginako
    Quartulli, Marco
    Olaizola, Igor G.
    Zulueta, Ekaitz
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2015, 8 (05) : 2165 - 2176