Examining linguistic shifts between preprints and publications

被引:7
|
作者
Nicholson, David N. [1 ]
Rubinetti, Vincent [1 ,2 ]
Hu, Dongbo [1 ]
Thielk, Marvin [3 ]
Hunter, Lawrence E. [4 ]
Greene, Casey S. [1 ,2 ,5 ]
机构
[1] Univ Penn, Dept Syst Pharmacol & Translat Therapeut, Perelman Sch Med, Philadelphia, PA 19104 USA
[2] Univ Colorado, Ctr Hlth AI, Sch Med, Aurora, CO 80045 USA
[3] Elsevier, Philadelphia, PA USA
[4] Univ Colorado, Ctr Computat Pharmacol, Sch Med, Aurora, CO USA
[5] Univ Colorado, Dept Biochem & Mol Genet, Sch Med, Aurora, CO 80045 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1371/journal.pbio.3001470
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Preprints allow researchers to make their findings available to the scientific community before they have undergone peer review. Studies on preprints within bioRxiv have been largely focused on article metadata and how often these preprints are downloaded, cited, published, and discussed online. A missing element that has yet to be examined is the language contained within the bioRxiv preprint repository. We sought to compare and contrast linguistic features within bioRxiv preprints to published biomedical text as a whole as this is an excellent opportunity to examine how peer review changes these documents. The most prevalent features that changed appear to be associated with typesetting and mentions of supporting information sections or additional files. In addition to text comparison, we created document embeddings derived from a preprint-trained word2vec model. We found that these embeddings are able to parse out different scientific approaches and concepts, link unannotated preprint-peer-reviewed article pairs, and identify journals that publish linguistically similar papers to a given preprint. We also used these embeddings to examine factors associated with the time elapsed between the posting of a first preprint and the appearance of a peer-reviewed publication. We found that preprints with more versions posted and more textual changes took longer to publish. Lastly, we constructed a web application (https://greenelab.github.io/preprint-similarity-search/) that allows users to identify which journals and articles that are most linguistically similar to a bioRxiv or medRxiv preprint as well as observe where the preprint would be positioned within a published article landscape.
引用
收藏
页数:22
相关论文
共 50 条
  • [31] Emerging voices or linguistic silence?: Examining a New Zealand linguistic landscape
    Macalister, John
    MULTILINGUA-JOURNAL OF CROSS-CULTURAL AND INTERLANGUAGE COMMUNICATION, 2010, 29 (01): : 55 - 75
  • [32] ROMANS AND ITALIANS IN DELOS - A LINGUISTIC CHECKLIST OF RECENT PUBLICATIONS
    POCCETTI, P
    ATHENAEUM-STUDI PERIODICI DI LETTERATURA E STORIA DELL ANTICHITA, 1984, 72 (3-4): : 646 - 656
  • [34] LINGUISTIC HISTORY PUBLICATIONS OF LORAND-EOTVOS-UNIVERSITY
    VOIGT, V
    ACTA ETHNOGRAPHICA ACADEMIAE SCIENTIARUM HUNGARICAE, 1979, 28 (1-4): : 440 - 442
  • [35] Reporting of funding and conflicts of interest improved from preprints to peer-reviewed publications of biomedical research
    Itani, Dima
    Lababidi, Ghena
    Itani, Rola
    El Ghoul, Tala
    Hamade, Lama
    Hijazi, Ayat R. A.
    Khabsa, Joanne
    Akl, Elie A.
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 2022, 149 : 146 - 153
  • [36] Linguistic and Cultural Shifts of the Aranadan Tribe in Kerala
    Robert, Sam
    GLOCAL CONFERENCE 2019 IN ASIA (THE CALA 2019): REVITALIZATION AND REPRESENTATION, 2019, : 341 - 346
  • [37] Literary avant-garde in linguistic shifts
    不详
    NOVYI MIR, 2018, (08): : 209 - 209
  • [38] Examining International Research Collaboration during the COVID-19 Pandemic using arXiv Preprints
    He, Jiangen
    Yan, Erjia
    Ni, Chaoqun
    18TH INTERNATIONAL CONFERENCE ON SCIENTOMETRICS & INFORMETRICS (ISSI2021), 2021, : 511 - 516
  • [39] Examining the Gender Gap in Emergency Medicine Research Publications
    Jacobs, Sarah A.
    Van Loveren, Kate
    Gottlieb, Dana
    Brave, Martina
    Loman, Jesse
    Weinman, Layne
    Kwon, Nancy
    ANNALS OF EMERGENCY MEDICINE, 2022, 79 (02) : 187 - 195
  • [40] Examining Wikipedia across Linguistic and Temporal Borders
    Tinati, Ramine
    Gaskell, Paul
    Tiropanis, Thanassis
    Phillipe, Olivier
    Hall, Wendy
    WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, : 445 - 450