A large-scale study on research code quality and execution

被引:0
|
作者
Ana Trisovic
Matthew K. Lau
Thomas Pasquier
Mercè Crosas
机构
[1] Harvard University,Institute for Quantitative Social Science
[2] Chinese Academy of Sciences,CAS Key Laboratory of Forest Ecology and Management, Institute of Applied Ecology
[3] University of British Columbia,Department of Computer Science
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
This article presents a study on the quality and execution of research code from publicly-available replication datasets at the Harvard Dataverse repository. Research code is typically created by a group of scientists and published together with academic papers to facilitate research transparency and reproducibility. For this study, we define ten questions to address aspects impacting research reproducibility and reuse. First, we retrieve and analyze more than 2000 replication datasets with over 9000 unique R files published from 2010 to 2020. Second, we execute the code in a clean runtime environment to assess its ease of reuse. Common coding errors were identified, and some of them were solved with automatic code cleaning to aid code execution. We find that 74% of R files failed to complete without error in the initial execution, while 56% failed when code cleaning was applied, showing that many errors can be prevented with good coding practices. We also analyze the replication datasets from journals’ collections and discuss the impact of the journal policy strictness on the code re-execution rate. Finally, based on our results, we propose a set of recommendations for code dissemination aimed at researchers, journals, and repositories.
引用
收藏
相关论文
共 50 条
  • [21] Understanding Source Code Comments at Large-Scale
    He, Hao
    ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, : 1217 - 1219
  • [22] Type Migration in Large-Scale Code Bases
    Ketkar, Ameya
    ESEC/FSE'18: PROCEEDINGS OF THE 2018 26TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2018, : 965 - 967
  • [23] Query by Example in Large-Scale Code Repositories
    Balachandran, Vipin
    2015 31ST INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME) PROCEEDINGS, 2015, : 467 - 476
  • [24] A large-scale analysis of bioinformatics code on GitHub
    Russell, Pamela H.
    Johnson, Rachel L.
    Ananthan, Shreyas
    Harnke, Benjamin
    Carlson, Nichole E.
    PLOS ONE, 2018, 13 (10):
  • [25] Large-scale DFT Calculations with the CONQUEST Code
    Miyazaki, T.
    Bowler, D. R.
    Gillan, M. J.
    Otsuka, T.
    Ohno, T.
    COMPUTATIONAL METHODS IN SCIENCE AND ENGINEERING, VOL 2: ADVANCES IN COMPUTATIONAL SCIENCE, 2009, 1148 : 685 - +
  • [26] T-Evos: A Large-Scale Longitudinal Study on CI Test Execution and Failure
    Chen, An Ran
    Chen, Tse-Hsun
    Wang, Shaowei
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (04) : 2352 - 2365
  • [27] The Research of Large-Scale Real Estate Enterprises' Project Quality Management
    Long, Yuxiang
    PROCEEDINGS OF THE 2012 INTERNATIONAL CONFERENCE ON MANAGEMENT INNOVATION AND PUBLIC POLICY (ICMIPP 2012), VOLS 1-6, 2012, : 1760 - 1762
  • [28] A large-scale empirical study on the lifecycle of code smell co-occurrences
    Palomba, Fabio
    Bavota, Gabriele
    Di Penta, Massimiliano
    Fasano, Fausto
    Oliveto, Rocco
    De Lucia, Andrea
    INFORMATION AND SOFTWARE TECHNOLOGY, 2018, 99 : 1 - 10
  • [29] Code Coverage and Postrelease Defects: A Large-Scale Study on Open Source Projects
    Kochhar, Pavneet Singh
    Lo, David
    Lawall, Julia
    Nagappan, Nachiappan
    IEEE TRANSACTIONS ON RELIABILITY, 2017, 66 (04) : 1213 - 1228
  • [30] Mining Preconditions of APIs in Large-Scale Code Corpus
    Hoan Anh Nguyen
    Dyer, Robert
    Nguyen, Tien N.
    Rajan, Hridesh
    22ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (FSE 2014), 2014, : 166 - 177