PubMed related articles: a probabilistic topic-based model for content similarity

被引:126
|
作者
Lin, Jimmy [1 ,2 ]
Wilbur, W. John [2 ]
机构
[1] Univ Maryland, Coll Informat Studies, College Pk, MD 20742 USA
[2] Natl Lib Med, Natl Ctr Biotechnol Informat, Bethesda, MD 20894 USA
关键词
Information Retrieval; MeSH; Retrieval Model; Related Article; Test Collection;
D O I
10.1186/1471-2105-8-423
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: We present a probabilistic topic-based model for content similarity called pmra that underlies the related article search feature in PubMed. Whether or not a document is about a particular topic is computed from term frequencies, modeled as Poisson distributions. Unlike previous probabilistic retrieval models, we do not attempt to estimate relevance-but rather our focus is "relatedness", the probability that a user would want to examine a particular document given known interest in another. We also describe a novel technique for estimating parameters that does not require human relevance judgments; instead, the process is based on the existence of MeSH (R) in MEDLINE (R). Results: The pmra retrieval model was compared against bm25, a competitive probabilistic model that shares theoretical similarities. Experiments using the test collection from the TREC 2005 genomics track shows a small but statistically significant improvement of pmra over bm25 in terms of precision. Conclusion: Our experiments suggest that the pmra model provides an effective ranking algorithm for related article search.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] PubMed related articles: a probabilistic topic-based model for content similarity
    Jimmy Lin
    W John Wilbur
    [J]. BMC Bioinformatics, 8
  • [2] Topic-based ranking in Folksonomy via probabilistic model
    Yan’an Jin
    Ruixuan Li
    Kunmei Wen
    Xiwu Gu
    Fei Xiao
    [J]. Artificial Intelligence Review, 2011, 36 : 139 - 151
  • [3] Topic-based ranking in Folksonomy via probabilistic model
    Jin, Yan'an
    Li, Ruixuan
    Wen, Kunmei
    Gu, Xiwu
    Xiao, Fei
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2011, 36 (02) : 139 - 151
  • [4] A REVIEW OF PUBMED ARTICLES RELATED TO MHEALTH USING TOPIC MODELLING
    Kreiner, K.
    Modre-Osprian, R.
    Schreier, G.
    [J]. EHEALTH2012 - HEALTH INFORMATICS MEETS EHEALTH - VON DER WISSENSCHAFT ZUR ANWENDUNG UND ZURUCK: MOBILE HEALTH & CARE - GESUNDHEITSVORSORGE IMMER UND UBERALL, 2012, : 235 - 240
  • [5] Content Patterns in Topic-Based Overlapping Communities
    Rios, Sebastian A.
    Munoz, Ricardo
    [J]. SCIENTIFIC WORLD JOURNAL, 2014,
  • [6] A topic-based document correlation model
    Jia, Xi-Ping
    Peng, Hong
    Zheng, Qj-Lun
    Jiang, Zhuo-Lin
    Li, Zhao
    [J]. PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 2487 - 2491
  • [7] Vaccine misinformation - topic-based content analysis on Facebook
    Klimiuk, K.
    Biernacka, K.
    Balwicki, L.
    [J]. EUROPEAN JOURNAL OF PUBLIC HEALTH, 2020, 30 : V1069 - V1069
  • [8] Modeling Flickr Communities Through Probabilistic Topic-Based Analysis
    Negoescu, Radu-Andrei
    Gatica-Perez, Daniel
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2010, 12 (05) : 399 - 416
  • [9] Automatic Topic-based CF Recommendation Method Considering Subject Similarity
    Noh, KyoungJu
    Moon, KyungDuk
    Jeong, HyunTae
    [J]. 2017 14TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS AND AMBIENT INTELLIGENCE (URAI), 2017, : 429 - 432
  • [10] Blending Topic-based Embeddings and Cosine Similarity for Open Data Discovery
    Franciscatto, Maria Helena
    Del Fabro, Marcos Didonet
    Erpen de Bona, Luis Carlos
    Trois, Celio
    Tissot, Hegler
    [J]. ICEIS: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS - VOL 1, 2022, : 163 - 170