Predicting substantive biomedical citations without full text

被引:3
|
作者
Hoppe, Travis A. [1 ]
Arabi, Salsabil [2 ]
Hutchins, B. Ian [2 ]
机构
[1] CDCP, Off Director, Natl Ctr Hlth Stat, Hyattsville, MD 20782 USA
[2] Univ Wisconsin Madison, Informat Sch, Sch Comp Data & Informat Sci, Coll Letters & Sci, Madison, WI 53706 USA
关键词
science policy; machine learning; citation analysis; artificial intelligence; bench to bedside translation; PREPRINTS;
D O I
10.1073/pnas.2213697120
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Insights from biomedical citation networks can be used to identify promising avenues for accelerating research and its downstream bench-to-bedside translation. Citation analy-sis generally assumes that each citation documents substantive knowledge transfer that informed the conception, design, or execution of the main experiments. Citations may exist for other reasons. In this paper, we take advantage of late-stage citations added during peer review because these are less likely to represent substantive knowledge flow. Using a large, comprehensive feature set of open access data, we train a predictive model to identify late-stage citations. The model relies only on the title, abstract, and citations to previous articles but not the full-text or future citations patterns, making it suitable for publications as soon as they are released, or those behind a paywall (the vast majority). We find that high prediction scores identify late-stage citations that were likely added during the peer review process as well as those more likely to be rhetorical, such as journal self-citations added during review. Our model conversely gives low prediction scores to early-stage citations and citation classes that are known to represent substantive knowledge transfer. Using this model, we find that US federally funded biomedical research publications represent 30% of the predicted early-stage (and more likely to be substantive) knowledge transfer from basic studies to clinical research, even though these comprise only 10% of the literature. This is a threefold overrepresentation in this important type of knowledge flow.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Predicting the citations of scholarly paper
    Bai, Xiaomei
    Zhang, Fuli
    Lee, Ivan
    JOURNAL OF INFORMETRICS, 2019, 13 (01) : 407 - 418
  • [32] Deep context of citations using machine-learning models in scholarly full-text articles
    Hassan, Saeed-Ul
    Imran, Mubashir
    Iqbal, Sehrish
    Aljohani, Naif Radi
    Nawaz, Raheel
    SCIENTOMETRICS, 2018, 117 (03) : 1645 - 1662
  • [33] Deep context of citations using machine-learning models in scholarly full-text articles
    Saeed-Ul Hassan
    Mubashir Imran
    Sehrish Iqbal
    Naif Radi Aljohani
    Raheel Nawaz
    Scientometrics, 2018, 117 : 1645 - 1662
  • [34] Predicting the Impact of Scientific Concepts Using Full-Text Features
    McKeown, Kathy
    Daume, Hal, III
    Chaturvedi, Snigdha
    Paparrizos, John
    Thadani, Kapil
    Barrio, Pablo
    Biran, Or
    Bothe, Suvarna
    Collins, Michael
    Fleischmann, Kenneth R.
    Gravano, Luis
    Jha, Rahul
    King, Ben
    McInerney, Kevin
    Moon, Taesun
    Neelakantan, Arvind
    O'Seaghdha, Diarmuid
    Radev, Dragomir
    Templeton, Clay
    Teufel, Simone
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2016, 67 (11) : 2684 - 2696
  • [35] A novel approach for classification and clustering of biomedical citations
    Parthasarathy, G.
    Tomar, D. C.
    BIOMEDICAL RESEARCH-INDIA, 2016, 27 : S22 - S30
  • [36] Analysis of protein/protein interactions through biomedical literature: Text mining of abstracts vs. text mining of full text articles
    Martin, EPG
    Bremer, EG
    Guerin, MC
    DeSesa, C
    Jouve, O
    KNOWLEDGE EXPLORATION IN LIFE SCIENCE INFORMATICS, PROCEEDINGS, 2004, 3303 : 96 - 108
  • [38] Facilitating Full-text Access to Biomedical Literature Using Open Access Resources
    Kang, Hongyu
    Hou, Zhen
    Li, Jiao
    MEDINFO 2015: EHEALTH-ENABLED HEALTH, 2015, 216 : 1123 - 1123
  • [39] Exploring Features for Predicting Policy Citations
    Bailey, Christian
    Kale, Bharat
    Walker, Jamieson
    Siravuri, Harish Varma
    Alhoori, Hamed
    Papka, Michael E.
    2017 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL 2017), 2017, : 297 - 298
  • [40] CRIS with in-text citations as interactive entities
    Parinov, Sergey
    14TH INTERNATIONAL CONFERENCE ON CURRENT RESEARCH INFORMATION SYSTEMS (CRIS2018): FAIRNESS OF RESEARCH INFORMATION, 2019, 146 : 20 - 28