A large-scale study on research code quality and execution

被引：0

作者：

Ana Trisovic

Matthew K. Lau

Thomas Pasquier

Mercè Crosas

机构：

[1] Harvard University,Institute for Quantitative Social Science

[2] Chinese Academy of Sciences,CAS Key Laboratory of Forest Ecology and Management, Institute of Applied Ecology

[3] University of British Columbia,Department of Computer Science

来源：

Scientific Data | / 9卷

关键词：

D O I：

暂无

中图分类号：

学科分类号：

摘要：

This article presents a study on the quality and execution of research code from publicly-available replication datasets at the Harvard Dataverse repository. Research code is typically created by a group of scientists and published together with academic papers to facilitate research transparency and reproducibility. For this study, we define ten questions to address aspects impacting research reproducibility and reuse. First, we retrieve and analyze more than 2000 replication datasets with over 9000 unique R files published from 2010 to 2020. Second, we execute the code in a clean runtime environment to assess its ease of reuse. Common coding errors were identified, and some of them were solved with automatic code cleaning to aid code execution. We find that 74% of R files failed to complete without error in the initial execution, while 56% failed when code cleaning was applied, showing that many errors can be prevented with good coding practices. We also analyze the replication datasets from journals’ collections and discuss the impact of the journal policy strictness on the code re-execution rate. Finally, based on our results, we propose a set of recommendations for code dissemination aimed at researchers, journals, and repositories.

引用

共 50 条

[1] A large-scale study on research code quality and execution
Trisovic, Ana
Lau, Matthew K.
Pasquier, Thomas
Crosas, Merce
SCIENTIFIC DATA, 2022, 9 (01)
[2] A Large-Scale Study of Programming Languages and Code Quality in GitHub
Ray, Baishakhi
Posnett, Daryl
Devanbu, Premkumar
Filkov, Vladimir
COMMUNICATIONS OF THE ACM, 2017, 60 (10) : 91 - 100
[3] Selection and Execution of large-scale projects
Ahrens, G. -A.
Beckmann, K. J.
Boltze, M.
Eisenkopf, A.
Fricke, H.
Knieps, G.
Knorr, A.
Mitusch, K.
Oeter, S.
Radermacher, F. -J
Sieg, G.
Siegmann, J.
Schlag, B.
Stoelzle, W.
Vallee, D.
Winner, H.
BAUINGENIEUR, 2015, 90 : 129 - 139
[4] A Large-Scale Study on Source Code Reviewer Recommendation
Lipcak, Jakub
Rossi, Bruno
44TH EUROMICRO CONFERENCE ON SOFTWARE ENGINEERING AND ADVANCED APPLICATIONS (SEAA 2018), 2018, : 378 - 387
[5] Modeling research on manufacturing execution system based on large-scale system cybernetics
Wu Y.
Xu X.-D.
Li C.-X.
J. Shanghai Jiaotong Univ. Sci., 2008, 6 (744-747): : 744 - 747
[6] Modeling Research on Manufacturing Execution System Based on Large-scale System Cybernetics
吴瑜
许晓栋
李从心
Journal of Shanghai Jiaotong University(Science), 2008, 13 (06) : 744 - 747
[7] LARGE-SCALE RESEARCH ON QUALITY OF EXPERIENCE (QoE) ALGORITHMS
Leszczuk, Mikolaj
Szczerba, Blazej
Glowacz, Andrzej
Derkacz, Jan
Dziech, Andrzej
Romaniak, Piotr
COMPUTER SCIENCE-AGH, 2013, 14 (01): : 63 - 75
[8] On Execution Platforms for Large-Scale Aggregate Computing
Viroli, Mirko
Casadei, Roberto
Pianini, Danilo
UBICOMP'16 ADJUNCT: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING, 2016, : 1321 - 1326
[9] A Case Study of Refactoring Large-Scale Industrial Systems to Efficiently Improve Source Code Quality
Szoke, Gabor
Nagy, Csaba
Ferenc, Rudolf
Gyimothy, Tibor
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2014, PT V, 2014, 8583 : 524 - 540
[10] Large-Scale Study of Perceptual Video Quality
Sinno, Zeina
Bovik, Alan Conrad
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (02) : 612 - 627

← 1 2 3 4 5 →