A large-scale study on research code quality and execution

被引：0

作者：

Ana Trisovic

Matthew K. Lau

Thomas Pasquier

Mercè Crosas

机构：

[1] Harvard University,Institute for Quantitative Social Science

[2] Chinese Academy of Sciences,CAS Key Laboratory of Forest Ecology and Management, Institute of Applied Ecology

[3] University of British Columbia,Department of Computer Science

来源：

Scientific Data | / 9卷

关键词：

D O I：

暂无

中图分类号：

学科分类号：

摘要：

This article presents a study on the quality and execution of research code from publicly-available replication datasets at the Harvard Dataverse repository. Research code is typically created by a group of scientists and published together with academic papers to facilitate research transparency and reproducibility. For this study, we define ten questions to address aspects impacting research reproducibility and reuse. First, we retrieve and analyze more than 2000 replication datasets with over 9000 unique R files published from 2010 to 2020. Second, we execute the code in a clean runtime environment to assess its ease of reuse. Common coding errors were identified, and some of them were solved with automatic code cleaning to aid code execution. We find that 74% of R files failed to complete without error in the initial execution, while 56% failed when code cleaning was applied, showing that many errors can be prevented with good coding practices. We also analyze the replication datasets from journals’ collections and discuss the impact of the journal policy strictness on the code re-execution rate. Finally, based on our results, we propose a set of recommendations for code dissemination aimed at researchers, journals, and repositories.

引用

共 50 条

[21] Understanding Source Code Comments at Large-Scale
He, Hao
ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, : 1217 - 1219
[22] Type Migration in Large-Scale Code Bases
Ketkar, Ameya
ESEC/FSE'18: PROCEEDINGS OF THE 2018 26TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2018, : 965 - 967
[23] Query by Example in Large-Scale Code Repositories
Balachandran, Vipin
2015 31ST INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME) PROCEEDINGS, 2015, : 467 - 476
[24] A large-scale analysis of bioinformatics code on GitHub
Russell, Pamela H.
Johnson, Rachel L.
Ananthan, Shreyas
Harnke, Benjamin
Carlson, Nichole E.
PLOS ONE, 2018, 13 (10):
[25] Large-scale DFT Calculations with the CONQUEST Code
Miyazaki, T.
Bowler, D. R.
Gillan, M. J.
Otsuka, T.
Ohno, T.
COMPUTATIONAL METHODS IN SCIENCE AND ENGINEERING, VOL 2: ADVANCES IN COMPUTATIONAL SCIENCE, 2009, 1148 : 685 - +
[26] T-Evos: A Large-Scale Longitudinal Study on CI Test Execution and Failure
Chen, An Ran
Chen, Tse-Hsun
Wang, Shaowei
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (04) : 2352 - 2365
[27] The Research of Large-Scale Real Estate Enterprises' Project Quality Management
Long, Yuxiang
PROCEEDINGS OF THE 2012 INTERNATIONAL CONFERENCE ON MANAGEMENT INNOVATION AND PUBLIC POLICY (ICMIPP 2012), VOLS 1-6, 2012, : 1760 - 1762
[28] A large-scale empirical study on the lifecycle of code smell co-occurrences
Palomba, Fabio
Bavota, Gabriele
Di Penta, Massimiliano
Fasano, Fausto
Oliveto, Rocco
De Lucia, Andrea
INFORMATION AND SOFTWARE TECHNOLOGY, 2018, 99 : 1 - 10
[29] Code Coverage and Postrelease Defects: A Large-Scale Study on Open Source Projects
Kochhar, Pavneet Singh
Lo, David
Lawall, Julia
Nagappan, Nachiappan
IEEE TRANSACTIONS ON RELIABILITY, 2017, 66 (04) : 1213 - 1228
[30] Mining Preconditions of APIs in Large-Scale Code Corpus
Hoan Anh Nguyen
Dyer, Robert
Nguyen, Tien N.
Rajan, Hridesh
22ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (FSE 2014), 2014, : 166 - 177

← 1 2 3 4 5 →