SciDataFlow: a tool for improving the flow of data through science

被引:0
|
作者
Buffalo, Vince [1 ,2 ]
机构
[1] Univ Calif Berkeley, Dept Integrat Biol, Berkeley, CA USA
[2] Univ Calif Berkeley, Dept Integrat Biol, 4134 Valley Life Sci Bldg, Berkeley, CA 94720 USA
关键词
D O I
10.1093/bioinformatics/btad754
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation Managing data and code in open scientific research is complicated by two key problems: large datasets often cannot be stored alongside code in repository platforms like GitHub, and iterative analysis can lead to unnoticed changes to data, increasing the risk that analyses are based on older versions of data.Results SciDataFlow is a fast, concurrent command-line tool paired with a simple Data Manifest specification that streamlines tracking data changes, uploading data to remote repositories, and pulling in all data necessary to reproduce a computational analysis.Availability and implementation SciDataFlow is available at https://github.com/vsbuffalo/scidataflow.
引用
收藏
页数:3
相关论文
共 50 条
  • [1] Improving child health through Big Data and data science
    Vesoulis, Zachary A.
    Husain, Ameena N.
    Cole, F. Sessions
    [J]. PEDIATRIC RESEARCH, 2023, 93 (02) : 342 - 349
  • [2] Improving child health through Big Data and data science
    Zachary A. Vesoulis
    Ameena N. Husain
    F. Sessions Cole
    [J]. Pediatric Research, 2023, 93 : 342 - 349
  • [3] Teaching Data Science through Storytelling: Improving Undergraduate Data Literacy
    Li, You
    Wang, Ye
    Lee, Yugyung
    Chen, Huan
    Petri, Alexis Nicolle
    Cha, Teryn
    [J]. THINKING SKILLS AND CREATIVITY, 2023, 48
  • [4] Improving Customer's Flow Through Data Analytics
    Ma, Nang Laik
    Choy, Murphy
    [J]. ADVANCES AND TRENDS IN ARTIFICIAL INTELLIGENCE: FROM THEORY TO PRACTICE, 2019, 11606 : 279 - 286
  • [5] Improving urban flow predictions through data assimilation
    Sousa, Jorge
    Garcia-Sanchez, Clara
    Gorle, Catherine
    [J]. BUILDING AND ENVIRONMENT, 2018, 132 : 282 - 290
  • [6] Browsing the sky through the ASI Science Data Centre Data Explorer Tool
    D'Elia, V.
    Capalbi, M.
    Verrecchia, F.
    Gendre, B.
    Giommi, P.
    [J]. DECIPHERING THE ANCIENT UNIVERSE WITH GAMMA-RAY BURSTS, 2010, 1279 : 302 - 305
  • [7] Improving signature testing through dynamic data flow analysis
    Kruegel, Christopher
    Balzarotti, Davide
    Robertson, William
    Vigna, Giovanni
    [J]. TWENTY-THIRD ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE, PROCEEDINGS, 2007, : 53 - 63
  • [8] DataRPG: Improving student motivation in data science through gaming elements
    Abdool, Azim
    Ringis, Daniel
    Maharajh, Aniel
    Sirju, Lynda
    Abdool, Hannah
    [J]. 2017 IEEE FRONTIERS IN EDUCATION CONFERENCE (FIE), 2017,
  • [9] Improving Reproducibility of Data Science Pipelines through Transparent Provenance Capture
    Rupprecht, Lukas
    Davis, James C.
    Arnold, Constantine
    Gur, Yaniv
    Bhagwat, Deepavali
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (12): : 3354 - 3368
  • [10] Improving plant bioaccumulation science through consistent reporting of experimental data
    Fantke, Peter
    Arnot, Jon A.
    Doucette, William J.
    [J]. JOURNAL OF ENVIRONMENTAL MANAGEMENT, 2016, 181 : 374 - 384