Examining linguistic shifts between preprints and publications

被引:7
|
作者
Nicholson, David N. [1 ]
Rubinetti, Vincent [1 ,2 ]
Hu, Dongbo [1 ]
Thielk, Marvin [3 ]
Hunter, Lawrence E. [4 ]
Greene, Casey S. [1 ,2 ,5 ]
机构
[1] Univ Penn, Dept Syst Pharmacol & Translat Therapeut, Perelman Sch Med, Philadelphia, PA 19104 USA
[2] Univ Colorado, Ctr Hlth AI, Sch Med, Aurora, CO 80045 USA
[3] Elsevier, Philadelphia, PA USA
[4] Univ Colorado, Ctr Computat Pharmacol, Sch Med, Aurora, CO USA
[5] Univ Colorado, Dept Biochem & Mol Genet, Sch Med, Aurora, CO 80045 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1371/journal.pbio.3001470
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Preprints allow researchers to make their findings available to the scientific community before they have undergone peer review. Studies on preprints within bioRxiv have been largely focused on article metadata and how often these preprints are downloaded, cited, published, and discussed online. A missing element that has yet to be examined is the language contained within the bioRxiv preprint repository. We sought to compare and contrast linguistic features within bioRxiv preprints to published biomedical text as a whole as this is an excellent opportunity to examine how peer review changes these documents. The most prevalent features that changed appear to be associated with typesetting and mentions of supporting information sections or additional files. In addition to text comparison, we created document embeddings derived from a preprint-trained word2vec model. We found that these embeddings are able to parse out different scientific approaches and concepts, link unannotated preprint-peer-reviewed article pairs, and identify journals that publish linguistically similar papers to a given preprint. We also used these embeddings to examine factors associated with the time elapsed between the posting of a first preprint and the appearance of a peer-reviewed publication. We found that preprints with more versions posted and more textual changes took longer to publish. Lastly, we constructed a web application (https://greenelab.github.io/preprint-similarity-search/) that allows users to identify which journals and articles that are most linguistically similar to a bioRxiv or medRxiv preprint as well as observe where the preprint would be positioned within a published article landscape.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Preprints: a game changer in scientific publications?
    Alfonso, Fernando
    Crea, Filippo
    EUROPEAN HEART JOURNAL, 2023, 44 (03) : 171 - 173
  • [2] The consistency of impact of preprints and their journal publications
    Xu, Fang
    Ou, Guiyan
    Ma, Tingcan
    Wang, Xianwen
    JOURNAL OF INFORMETRICS, 2021, 15 (02)
  • [3] Searching and Evaluating Publications and Preprints Using Europe PMC
    Rosonovski, Summer
    Levchenko, Maria
    Ide-Smith, Michele
    Faulk, Lynne
    Harrison, Melissa
    McEntyre, Johanna
    CURRENT PROTOCOLS, 2023, 3 (03):
  • [4] Examining the citation and altmetric advantage of bioRxiv preprints
    Fraser, Nicholas
    Momeni, Fakhri
    Mayr, Philipp
    Peters, Isabella
    17TH INTERNATIONAL CONFERENCE ON SCIENTOMETRICS & INFORMETRICS (ISSI2019), VOL I, 2019, : 667 - 672
  • [5] Geographical shifts in publications
    Opthof, T
    CARDIOVASCULAR RESEARCH, 1999, 42 (01) : 1 - 2
  • [6] Examining the Otolaryngology Match and Relationships Between Publications and Institutional Rankings
    Ryan, Evan M.
    Geelan-Hansen, Katie R.
    Nelson, Kari L.
    Dowdall, Jayme R.
    OTO OPEN, 2020, 4 (02)
  • [7] Preprints: Are they precedents or expedient substitutes for peer-reviewed journal publications?
    Lu, Emily
    Kumar, Amudha
    Chidambaram, Vignesh
    Majella, Marie Gilbert
    Geetha, Harinivas Shanmugavel
    Zimmerman, Alyssa
    Karakousis, Petros C.
    AMERICAN JOURNAL OF THE MEDICAL SCIENCES, 2024, 368 (01): : 80 - 82
  • [8] A lead-lag analysis of the topic evolution patterns for preprints and publications
    Hu, Beibei
    Dong, Xianlei
    Zhang, Chenwei
    Bowman, Timothy D.
    Ding, Ying
    Milojevic, Stasa
    Ni, Chaoqun
    Yan, Erjia
    Lariviere, Vincent
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2015, 66 (12) : 2643 - 2656
  • [9] SOME ETHNOLOGICAL AND LINGUISTIC PUBLICATIONS
    Schebesta, Paul
    AFRICA, 1928, 1 (01): : 116 - 124
  • [10] Bibliometric Analysis of Neuro Linguistic Programming Publications Between 1983-2018
    Sabate, Montse
    Diez, Eduardo
    JOURNAL OF LEARNING STYLES, 2020, 13 (25): : 125 - 143