FAIRly big: A framework for computationally reproducible processing of large-scale data

被引:0
|
作者
Adina S. Wagner
Laura K. Waite
Małgorzata Wierzba
Felix Hoffstaedter
Alexander Q. Waite
Benjamin Poldrack
Simon B. Eickhoff
Michael Hanke
机构
[1] Research Center Jülich,Institute of Neuroscience and Medicine, Brain & Behaviour (INM
[2] Polish Academy of Sciences,7)
[3] Heinrich Heine University Düsseldorf,Laboratory of Brain Imaging, Nencki Institute of Experimental Biology
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Large-scale datasets present unique opportunities to perform scientific investigations with unprecedented breadth. However, they also pose considerable challenges for the findability, accessibility, interoperability, and reusability (FAIR) of research outcomes due to infrastructure limitations, data usage constraints, or software license restrictions. Here we introduce a DataLad-based, domain-agnostic framework suitable for reproducible data processing in compliance with open science mandates. The framework attempts to minimize platform idiosyncrasies and performance-related complexities. It affords the capture of machine-actionable computational provenance records that can be used to retrace and verify the origins of research outcomes, as well as be re-executed independent of the original computing infrastructure. We demonstrate the framework’s performance using two showcases: one highlighting data sharing and transparency (using the studyforrest.org dataset) and another highlighting scalability (using the largest public brain imaging dataset available: the UK Biobank dataset).
引用
收藏
相关论文
共 50 条
  • [21] Automated pipeline framework for processing of large-scale building energy time series data
    Khalilnejad, Arash
    Karimi, Ahmad M.
    Kamath, Shreyas
    Haddadian, Rojiar
    French, Roger H.
    Abramson, Alexis R.
    [J]. PLOS ONE, 2020, 15 (12):
  • [22] A United Framework for Large-Scale Resource Description Framework Stream Processing
    Fang, Hong
    Zhao, Bo
    Zhang, Xiao-Wang
    Yang, Xuan-Xing
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2019, 34 (04) : 762 - 774
  • [23] A United Framework for Large-Scale Resource Description Framework Stream Processing
    Hong Fang
    Bo Zhao
    Xiao-Wang Zhang
    Xuan-Xing Yang
    [J]. Journal of Computer Science and Technology, 2019, 34 : 762 - 774
  • [24] Explore Deep Neural Network and Reinforcement Learning to Large-scale Tasks Processing in Big Data
    Wu, Chunyi
    Xu, Gaochao
    Ding, Yan
    Zhao, Jia
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2019, 33 (13)
  • [25] A novel large-scale task processing approach for big data across multi-domain
    Wu, Chunyi
    Xu, Gaochao
    Zhao, Jia
    Ding, Yan
    [J]. ADVANCES IN MECHANICAL ENGINEERING, 2018, 10 (12):
  • [26] Multivariate Fairly Normal Traffic Model for Aggregate Load in Large-Scale Data Networks
    Mata, F.
    Garcia-Dorado, J. L.
    Aracil, J.
    [J]. WIRED-WIRELESS INTERNET COMMUNICATIONS, PROCEEDINGS, 2010, 6074 : 278 - 289
  • [27] Recent trends of research and development for large-scale data storing and parallel distributed processing in big data era
    Fujii, Hidekaki
    Haraguchi, Hiroshi
    Hijiya, Makoto
    Iwazume, Michiaki
    Iwase, Takahiro
    [J]. Computer Software, 2013, 30 (01) : 130 - 151
  • [28] Survey of Large-Scale Data Management Systems for Big Data Applications
    Wu, Lengdong
    Yuan, Liyan
    You, Jiahuai
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2015, 30 (01) : 163 - 183
  • [29] Performance Evaluation of Big Data Frameworks for Large-Scale Data Analytics
    Veiga, Jorge
    Exposito, Roberto R.
    Pardo, Xoan C.
    Taboada, Guillermo L.
    Tourino, Juan
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 424 - 431
  • [30] Survey of Large-Scale Data Management Systems for Big Data Applications
    Lengdong Wu
    Liyan Yuan
    Jiahuai You
    [J]. Journal of Computer Science and Technology, 2015, 30 : 163 - 183