Examining linguistic shifts between preprints and publications

被引:7
|
作者
Nicholson, David N. [1 ]
Rubinetti, Vincent [1 ,2 ]
Hu, Dongbo [1 ]
Thielk, Marvin [3 ]
Hunter, Lawrence E. [4 ]
Greene, Casey S. [1 ,2 ,5 ]
机构
[1] Univ Penn, Dept Syst Pharmacol & Translat Therapeut, Perelman Sch Med, Philadelphia, PA 19104 USA
[2] Univ Colorado, Ctr Hlth AI, Sch Med, Aurora, CO 80045 USA
[3] Elsevier, Philadelphia, PA USA
[4] Univ Colorado, Ctr Computat Pharmacol, Sch Med, Aurora, CO USA
[5] Univ Colorado, Dept Biochem & Mol Genet, Sch Med, Aurora, CO 80045 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1371/journal.pbio.3001470
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Preprints allow researchers to make their findings available to the scientific community before they have undergone peer review. Studies on preprints within bioRxiv have been largely focused on article metadata and how often these preprints are downloaded, cited, published, and discussed online. A missing element that has yet to be examined is the language contained within the bioRxiv preprint repository. We sought to compare and contrast linguistic features within bioRxiv preprints to published biomedical text as a whole as this is an excellent opportunity to examine how peer review changes these documents. The most prevalent features that changed appear to be associated with typesetting and mentions of supporting information sections or additional files. In addition to text comparison, we created document embeddings derived from a preprint-trained word2vec model. We found that these embeddings are able to parse out different scientific approaches and concepts, link unannotated preprint-peer-reviewed article pairs, and identify journals that publish linguistically similar papers to a given preprint. We also used these embeddings to examine factors associated with the time elapsed between the posting of a first preprint and the appearance of a peer-reviewed publication. We found that preprints with more versions posted and more textual changes took longer to publish. Lastly, we constructed a web application (https://greenelab.github.io/preprint-similarity-search/) that allows users to identify which journals and articles that are most linguistically similar to a bioRxiv or medRxiv preprint as well as observe where the preprint would be positioned within a published article landscape.
引用
收藏
页数:22
相关论文
共 50 条
  • [21] Examining potential shortcomings in using phase shifts as a link between experiment and QCD
    Svarc, A.
    PHYSICAL REVIEW C, 2013, 87 (06):
  • [22] The relationship between bioRxiv preprints, citations and altmetrics
    Fraser, Nicholas
    Momeni, Fakhri
    Mayr, Philipp
    Peters, Isabella
    QUANTITATIVE SCIENCE STUDIES, 2020, 1 (02): : 618 - 638
  • [24] CURRENT PUBLICATIONS ON PHRASEOLOGY AND LINGUISTIC RITUALS - A BIBLIOGRAPHY
    STEIN, S
    DEUTSCHE SPRACHE, 1994, 22 (02): : 152 - 180
  • [25] Reach and impact of pharmaceutical industry-affiliated preprints, and subsequent peer-reviewed publications
    D'Angelo, Gina
    Gothard, David
    Law, Lisa
    Philippon, Valerierie
    Southam, Eric
    Wieting, Susan
    Lang, Heather
    CURRENT MEDICAL RESEARCH AND OPINION, 2019, 35 : 22 - 22
  • [26] Examining data visualization pitfalls in scientific publications
    Vinh T Nguyen
    Kwanghee Jung
    Vibhuti Gupta
    Visual Computing for Industry, Biomedicine, and Art, 4
  • [27] Examining data visualization pitfalls in scientific publications
    Nguyen, Vinh T.
    Jung, Kwanghee
    Gupta, Vibhuti
    VISUAL COMPUTING FOR INDUSTRY BIOMEDICINE AND ART, 2021, 4 (01)
  • [28] Examining the impact of publications in Chemistry Education Research
    Lewis, Scott E.
    Raker, Jeffrey R.
    Ye, Li
    Van Norman, Benjamin R.
    Oueini, Razanne
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2014, 248
  • [29] Examining connections between the physical and the mental in education: A linguistic analysis of PE teaching and learning
    Slater, Tammy
    Butler, Joy I.
    LINGUISTICS AND EDUCATION, 2015, 30 : 12 - 25
  • [30] No evidence of important difference in summary treatment effects between COVID-19 preprints and peer-reviewed publications: a meta-epidemiological study
    Davidson, Mauricia
    Evrenoglou, Theodoros
    Grana, Carolina
    Chaimani, Anna
    Boutron, Isabelle
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 2023, 162 : 90 - 97