Predicting substantive biomedical citations without full text

被引:3
|
作者
Hoppe, Travis A. [1 ]
Arabi, Salsabil [2 ]
Hutchins, B. Ian [2 ]
机构
[1] CDCP, Off Director, Natl Ctr Hlth Stat, Hyattsville, MD 20782 USA
[2] Univ Wisconsin Madison, Informat Sch, Sch Comp Data & Informat Sci, Coll Letters & Sci, Madison, WI 53706 USA
关键词
science policy; machine learning; citation analysis; artificial intelligence; bench to bedside translation; PREPRINTS;
D O I
10.1073/pnas.2213697120
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Insights from biomedical citation networks can be used to identify promising avenues for accelerating research and its downstream bench-to-bedside translation. Citation analy-sis generally assumes that each citation documents substantive knowledge transfer that informed the conception, design, or execution of the main experiments. Citations may exist for other reasons. In this paper, we take advantage of late-stage citations added during peer review because these are less likely to represent substantive knowledge flow. Using a large, comprehensive feature set of open access data, we train a predictive model to identify late-stage citations. The model relies only on the title, abstract, and citations to previous articles but not the full-text or future citations patterns, making it suitable for publications as soon as they are released, or those behind a paywall (the vast majority). We find that high prediction scores identify late-stage citations that were likely added during the peer review process as well as those more likely to be rhetorical, such as journal self-citations added during review. Our model conversely gives low prediction scores to early-stage citations and citation classes that are known to represent substantive knowledge transfer. Using this model, we find that US federally funded biomedical research publications represent 30% of the predicted early-stage (and more likely to be substantive) knowledge transfer from basic studies to clinical research, even though these comprise only 10% of the literature. This is a threefold overrepresentation in this important type of knowledge flow.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Classifying biomedical citations without labeled training examples
    Li, XL
    Joshi, R
    Ramachandaran, S
    Leong, TY
    FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 455 - 458
  • [2] Systematic Characterizations of Text Similarity in Full Text Biomedical Publications
    Sun, Zhaohui
    Errami, Mounir
    Long, Tara
    Renard, Chris
    Choradia, Nishant
    Garner, Harold
    PLOS ONE, 2010, 5 (09): : 1 - 6
  • [3] Database Citation in Full Text Biomedical Articles
    Kafkas, Senay
    Kim, Jee-Hyub
    McEntyre, Johanna R.
    PLOS ONE, 2013, 8 (05):
  • [4] Predicting Protein Function from Biomedical Text
    Taha, Kamal
    Yoo, Paul D.
    2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2015, : 3275 - 3278
  • [5] Identifying Important Citations using Contextual Information from Full Text
    Saeed-Ul Hassan
    Akram, Anam
    Haddawy, Peter
    2017 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL 2017), 2017, : 41 - 48
  • [6] MeSHup: A Corpus for Full Text Biomedical Document Indexing
    Wang, Xindi
    Mercer, Robert E.
    Rudzicz, Frank
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 5473 - 5483
  • [7] A Text-Mining System for Concept Annotation in Biomedical Full Text Articles
    Wei, Chih-Hsuan
    Allot, Alexis
    Leaman, Robert
    Lu, Zhiyong
    ACM-BCB'19: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, 2019, : 540 - 540
  • [8] Using the appearance of citations in full text on author co-citation analysis
    Yi Bu
    Binglu Wang
    Win-bin Huang
    Shangkun Che
    Yong Huang
    Scientometrics, 2018, 116 : 275 - 289
  • [9] Predicting drug characteristics using biomedical text embedding
    Shtar, Guy
    Greenstein-Messica, Asnat
    Mazuz, Eyal
    Rokach, Lior
    Shapira, Bracha
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [10] Predicting drug characteristics using biomedical text embedding
    Guy Shtar
    Asnat Greenstein-Messica
    Eyal Mazuz
    Lior Rokach
    Bracha Shapira
    BMC Bioinformatics, 23