Predicting substantive biomedical citations without full text

被引:3
|
作者
Hoppe, Travis A. [1 ]
Arabi, Salsabil [2 ]
Hutchins, B. Ian [2 ]
机构
[1] CDCP, Off Director, Natl Ctr Hlth Stat, Hyattsville, MD 20782 USA
[2] Univ Wisconsin Madison, Informat Sch, Sch Comp Data & Informat Sci, Coll Letters & Sci, Madison, WI 53706 USA
关键词
science policy; machine learning; citation analysis; artificial intelligence; bench to bedside translation; PREPRINTS;
D O I
10.1073/pnas.2213697120
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Insights from biomedical citation networks can be used to identify promising avenues for accelerating research and its downstream bench-to-bedside translation. Citation analy-sis generally assumes that each citation documents substantive knowledge transfer that informed the conception, design, or execution of the main experiments. Citations may exist for other reasons. In this paper, we take advantage of late-stage citations added during peer review because these are less likely to represent substantive knowledge flow. Using a large, comprehensive feature set of open access data, we train a predictive model to identify late-stage citations. The model relies only on the title, abstract, and citations to previous articles but not the full-text or future citations patterns, making it suitable for publications as soon as they are released, or those behind a paywall (the vast majority). We find that high prediction scores identify late-stage citations that were likely added during the peer review process as well as those more likely to be rhetorical, such as journal self-citations added during review. Our model conversely gives low prediction scores to early-stage citations and citation classes that are known to represent substantive knowledge transfer. Using this model, we find that US federally funded biomedical research publications represent 30% of the predicted early-stage (and more likely to be substantive) knowledge transfer from basic studies to clinical research, even though these comprise only 10% of the literature. This is a threefold overrepresentation in this important type of knowledge flow.
引用
收藏
页数:11
相关论文
共 50 条
  • [11] Extracting Sentences Concerning Research Results of Citations from Full Text of JASIST
    Bian, Jiaxin
    Shen, Si
    Wang, Dongbo
    18TH INTERNATIONAL CONFERENCE ON SCIENTOMETRICS & INFORMETRICS (ISSI2021), 2021, : 1445 - 1446
  • [12] Using the appearance of citations in full text on author co-citation analysis
    Bu, Yi
    Wang, Binglu
    Huang, Win-bin
    Che, Shangkun
    Huang, Yong
    SCIENTOMETRICS, 2018, 116 (01) : 275 - 289
  • [13] Data set Mentions and Citations: A Content Analysis of Full-Text Publications
    Zhao, Mengnan
    Yan, Erjia
    Li, Kai
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2018, 69 (01) : 32 - 46
  • [14] New tools in biomedical research: Dynamic full text collections
    Brunelle, BS
    Johnson, D
    HEALTH INFORMATION MANAGEMENT: WHAT STRATEGIES?, 1997, : 41 - 44
  • [15] TEXT MINING FROM BIOMEDICAL DOMAIN USING A FULL PARSER
    Govindarajan, Priya
    Ravichandran, K. S.
    2016 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT), VOL 2, 2016, : 522 - 530
  • [16] Distribution of information in biomedical abstracts and full-text publications
    Schuemie, MJ
    Weeber, M
    Schijvenaars, BJA
    van Mulligen, EM
    van der Eijk, CC
    Jelier, R
    Mons, B
    Kors, JA
    BIOINFORMATICS, 2004, 20 (16) : 2597 - 2604
  • [17] Classification of Full Text Biomedical Documents: Sections Importance Assessment
    Oliveira Goncalves, Carlos Adriano
    Camacho, Rui
    Goncalves, Celia Talma
    Seara Vieira, Adrian
    Borrajo Diz, Lourdes
    Lorenzo Iglesias, Eva
    APPLIED SCIENCES-BASEL, 2021, 11 (06):
  • [18] Full Text Clustering and Relationship Network Analysis of Biomedical Publications
    Guan, Renchu
    Yang, Chen
    Marchese, Maurizio
    Liang, Yanchun
    Shi, Xiaohu
    PLOS ONE, 2014, 9 (09):
  • [19] unarXive: a large scholarly data set with publications' full-text, annotated in-text citations, and links to metadata
    Saier, Tarek
    Faerber, Michael
    SCIENTOMETRICS, 2020, 125 (03) : 3085 - 3108
  • [20] unarXive: a large scholarly data set with publications’ full-text, annotated in-text citations, and links to metadata
    Tarek Saier
    Michael Färber
    Scientometrics, 2020, 125 : 3085 - 3108