A large-scale study on research code quality and execution

被引:0
|
作者
Ana Trisovic
Matthew K. Lau
Thomas Pasquier
Mercè Crosas
机构
[1] Harvard University,Institute for Quantitative Social Science
[2] Chinese Academy of Sciences,CAS Key Laboratory of Forest Ecology and Management, Institute of Applied Ecology
[3] University of British Columbia,Department of Computer Science
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
This article presents a study on the quality and execution of research code from publicly-available replication datasets at the Harvard Dataverse repository. Research code is typically created by a group of scientists and published together with academic papers to facilitate research transparency and reproducibility. For this study, we define ten questions to address aspects impacting research reproducibility and reuse. First, we retrieve and analyze more than 2000 replication datasets with over 9000 unique R files published from 2010 to 2020. Second, we execute the code in a clean runtime environment to assess its ease of reuse. Common coding errors were identified, and some of them were solved with automatic code cleaning to aid code execution. We find that 74% of R files failed to complete without error in the initial execution, while 56% failed when code cleaning was applied, showing that many errors can be prevented with good coding practices. We also analyze the replication datasets from journals’ collections and discuss the impact of the journal policy strictness on the code re-execution rate. Finally, based on our results, we propose a set of recommendations for code dissemination aimed at researchers, journals, and repositories.
引用
收藏
相关论文
共 50 条
  • [1] A large-scale study on research code quality and execution
    Trisovic, Ana
    Lau, Matthew K.
    Pasquier, Thomas
    Crosas, Merce
    SCIENTIFIC DATA, 2022, 9 (01)
  • [2] A Large-Scale Study of Programming Languages and Code Quality in GitHub
    Ray, Baishakhi
    Posnett, Daryl
    Devanbu, Premkumar
    Filkov, Vladimir
    COMMUNICATIONS OF THE ACM, 2017, 60 (10) : 91 - 100
  • [3] Selection and Execution of large-scale projects
    Ahrens, G. -A.
    Beckmann, K. J.
    Boltze, M.
    Eisenkopf, A.
    Fricke, H.
    Knieps, G.
    Knorr, A.
    Mitusch, K.
    Oeter, S.
    Radermacher, F. -J
    Sieg, G.
    Siegmann, J.
    Schlag, B.
    Stoelzle, W.
    Vallee, D.
    Winner, H.
    BAUINGENIEUR, 2015, 90 : 129 - 139
  • [4] A Large-Scale Study on Source Code Reviewer Recommendation
    Lipcak, Jakub
    Rossi, Bruno
    44TH EUROMICRO CONFERENCE ON SOFTWARE ENGINEERING AND ADVANCED APPLICATIONS (SEAA 2018), 2018, : 378 - 387
  • [5] Modeling research on manufacturing execution system based on large-scale system cybernetics
    Wu Y.
    Xu X.-D.
    Li C.-X.
    J. Shanghai Jiaotong Univ. Sci., 2008, 6 (744-747): : 744 - 747
  • [6] Modeling Research on Manufacturing Execution System Based on Large-scale System Cybernetics
    吴瑜
    许晓栋
    李从心
    Journal of Shanghai Jiaotong University(Science), 2008, 13 (06) : 744 - 747
  • [7] LARGE-SCALE RESEARCH ON QUALITY OF EXPERIENCE (QoE) ALGORITHMS
    Leszczuk, Mikolaj
    Szczerba, Blazej
    Glowacz, Andrzej
    Derkacz, Jan
    Dziech, Andrzej
    Romaniak, Piotr
    COMPUTER SCIENCE-AGH, 2013, 14 (01): : 63 - 75
  • [8] On Execution Platforms for Large-Scale Aggregate Computing
    Viroli, Mirko
    Casadei, Roberto
    Pianini, Danilo
    UBICOMP'16 ADJUNCT: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING, 2016, : 1321 - 1326
  • [9] A Case Study of Refactoring Large-Scale Industrial Systems to Efficiently Improve Source Code Quality
    Szoke, Gabor
    Nagy, Csaba
    Ferenc, Rudolf
    Gyimothy, Tibor
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2014, PT V, 2014, 8583 : 524 - 540
  • [10] Large-Scale Study of Perceptual Video Quality
    Sinno, Zeina
    Bovik, Alan Conrad
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (02) : 612 - 627