Modeling actions of PubMed users with n-gram language models

被引:15
|
作者
Lin, Jimmy [1 ,2 ]
Wilbur, W. John [2 ]
机构
[1] Univ Maryland, Coll Informat Studies, iSch, College Pk, MD 20742 USA
[2] Natl Lib Med, Natl Ctr Biotechnol Informat, Bethesda, MD 20894 USA
来源
INFORMATION RETRIEVAL | 2009年 / 12卷 / 04期
关键词
Search behavior; Query log analysis; SEARCH; STRATEGIES; PATTERNS; LIFE; WEB;
D O I
10.1007/s10791-008-9067-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Transaction logs from online search engines are valuable for two reasons: First, they provide insight into human information-seeking behavior. Second, log data can be used to train user models, which can then be applied to improve retrieval systems. This article presents a study of logs from PubMed(A (R)), the public gateway to the MEDLINEA (R) database of bibliographic records from the medical and biomedical primary literature. Unlike most previous studies on general Web search, our work examines user activities with a highly-specialized search engine. We encode user actions as string sequences and model these sequences using n-gram language models. The models are evaluated in terms of perplexity and in a sequence prediction task. They help us better understand how PubMed users search for information and provide an enabler for improving users' search experience.
引用
收藏
页码:487 / 503
页数:17
相关论文
共 50 条
  • [1] Modeling actions of PubMed users with n-gram language models
    Jimmy Lin
    W. John Wilbur
    [J]. Information Retrieval, 2009, 12 : 487 - 503
  • [2] On compressing n-gram language models
    Hirsimaki, Teemu
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 949 - 952
  • [3] Discriminative n-gram language modeling
    Roark, Brian
    Saraclar, Murat
    Collins, Michael
    [J]. COMPUTER SPEECH AND LANGUAGE, 2007, 21 (02): : 373 - 392
  • [4] Perplexity of n-Gram and Dependency Language Models
    Popel, Martin
    Marecek, David
    [J]. TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 173 - 180
  • [5] MIXTURE OF MIXTURE N-GRAM LANGUAGE MODELS
    Sak, Hasim
    Allauzen, Cyril
    Nakajima, Kaisuke
    Beaufays, Francoise
    [J]. 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 31 - 36
  • [6] Discriminative N-gram Language Modeling for Turkish
    Arisoy, Ebru
    Roark, Brian
    Shafran, Izhak
    Saraclar, Murat
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 825 - +
  • [7] MLP emulation of N-gram models as a first step to connectionist language modeling
    Castro, MJ
    Prat, F
    Casacuberta, F
    [J]. NINTH INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS (ICANN99), VOLS 1 AND 2, 1999, (470): : 910 - 915
  • [8] Bayesian learning of n-gram statistical language modeling
    Bai, Shuanhu
    Li, Haizhou
    [J]. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 1045 - 1048
  • [9] Profile based compression of n-gram language models
    Olsen, Jesper
    Oria, Daniela
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 1041 - 1044
  • [10] Improved N-gram Phonotactic Models For Language Recognition
    BenZeghiba, Mohamed Faouzi
    Gauvain, Jean-Luc
    Lamel, Lori
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2718 - 2721