Renku: a platform for sustainable data science

被引:0
|
作者
Roskar, Rok [1 ]
Ramakrishnan, Chandrasekhar [1 ]
Volpi, Michele [1 ]
Perez-Cruz, Fernando [1 ]
Alisafaee, Mohammad [2 ]
Fischer, Philipp [3 ]
Gasser, Lilian [1 ]
Harris, Eliza Jean [1 ]
Ozdemir, Firat [1 ]
Paitz, Patrick [3 ]
Remlinger, Carl [2 ]
Salamanca, Luis [1 ]
Grubenmann, Ralf [1 ]
Olevski, Tasko [1 ]
Garcia, Elisabet Capon [1 ]
Cavazzi, Lorenzo [1 ]
Chrobasik, Jakub [2 ]
Cordoba, Andrea [1 ]
Degano, Alessandro [2 ]
Dupre, Jimena [1 ]
Johnson, Wesley [1 ]
Kettner, Eike [1 ]
Kinkead, Laura [1 ]
Murphy, Sean [1 ]
Thiebaut, Flora [1 ]
Verscheure, Olivier [1 ,2 ]
机构
[1] Swiss Fed Inst Technol, Swiss Data Sci Ctr, Zurich, Switzerland
[2] Ecole Polytech Fed Lausanne, Swiss Data Sci Ctr, Lausanne, Switzerland
[3] WSL, Swiss Fed Inst Forest Snow & Landscape Res, Birmensdorf, Switzerland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data and code working together is fundamental to machine learning (ML), but the context around datasets and interactions between datasets and code are in general captured only rudimentarily. Context such as how the dataset was prepared and created, what source data were used, what code was used in processing, how the dataset evolved, and where it has been used and reused can provide much insight, but this information is often poorly documented. That is unfortunate since it makes datasets into black-boxes with potentially hidden characteristics that have downstream consequences. We argue that making dataset preparation more accessible and dataset usage easier to record and document would have significant benefits for the ML community: it would allow for greater diversity in datasets by inviting modification to published sources, simplify use of alternative datasets and, in doing so, make results more transparent and robust, while allowing for all contributions to be adequately credited. We present a platform, Renku, designed to support and encourage such sustainable development and use of data, datasets, and code, and we demonstrate its benefits through a few illustrative projects which span the spectrum from dataset creation to dataset consumption and showcasing.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Data for Sustainable Platform Economy: Connections between Platform Models and Sustainable Development Goals
    Fuster Morell, Mayo
    Espelt, Ricard
    Senabre Hidalgo, Enric
    [J]. DATA, 2021, 6 (02) : 1 - 11
  • [2] Data Science as a Service on Cloud Platform
    Srinivasan, Aishwarya
    Vijayakumar, V.
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL SYMPOSIUM ON BIG DATA AND CLOUD COMPUTING CHALLENGES (ISBCC - 16'), 2016, 49 : 273 - 281
  • [3] Data science approaches for sustainable development
    Strazzullo, Serena
    Cortez, Paulo
    Moro, Sergio
    [J]. EXPERT SYSTEMS, 2024, 41 (07)
  • [4] Measuring sustainable tourism with online platform data
    Felix J. Hoffmann
    Fabian Braesemann
    Timm Teubner
    [J]. EPJ Data Science, 11
  • [5] What is responsible and sustainable data science?
    Taylor, Linnet
    Purtova, Nadezhda
    [J]. BIG DATA & SOCIETY, 2019, 6 (02):
  • [6] Spatial data science for sustainable mobility
    Martin, Raubal
    [J]. JOURNAL OF SPATIAL INFORMATION SCIENCE, 2020, (20): : 109 - 114
  • [7] Measuring sustainable tourism with online platform data
    Hoffmann, Felix J.
    Braesemann, Fabian
    Teubner, Timm
    [J]. EPJ DATA SCIENCE, 2022, 11 (01)
  • [8] Fides: Towards a Platform for Responsible Data Science
    Stoyanovich, Julia
    Howe, Bill
    Abiteboul, Serge
    Miklau, Gerome
    Sahuguet, Arnaud
    Weikum, Gerhard
    [J]. SSDBM 2017: 29TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 2017,
  • [9] Data Lab-A community science platform
    Nikutta, R.
    Fitzpatrick, M.
    Scott, A.
    Weaver, B. A.
    [J]. ASTRONOMY AND COMPUTING, 2020, 33
  • [10] Toward a Smart Platform for Data Science Career
    Le, Phuong N. Y.
    Nguyen, Linh V.
    Nguyen, Tinh H.
    Vo, Khoi M.
    Hoang, Suong N.
    [J]. PROCEEDINGS OF 2019 11TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2019), 2019, : 275 - 280