FAIRly big: A framework for computationally reproducible processing of large-scale data

被引:0
|
作者
Adina S. Wagner
Laura K. Waite
Małgorzata Wierzba
Felix Hoffstaedter
Alexander Q. Waite
Benjamin Poldrack
Simon B. Eickhoff
Michael Hanke
机构
[1] Research Center Jülich,Institute of Neuroscience and Medicine, Brain & Behaviour (INM
[2] Polish Academy of Sciences,7)
[3] Heinrich Heine University Düsseldorf,Laboratory of Brain Imaging, Nencki Institute of Experimental Biology
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Large-scale datasets present unique opportunities to perform scientific investigations with unprecedented breadth. However, they also pose considerable challenges for the findability, accessibility, interoperability, and reusability (FAIR) of research outcomes due to infrastructure limitations, data usage constraints, or software license restrictions. Here we introduce a DataLad-based, domain-agnostic framework suitable for reproducible data processing in compliance with open science mandates. The framework attempts to minimize platform idiosyncrasies and performance-related complexities. It affords the capture of machine-actionable computational provenance records that can be used to retrace and verify the origins of research outcomes, as well as be re-executed independent of the original computing infrastructure. We demonstrate the framework’s performance using two showcases: one highlighting data sharing and transparency (using the studyforrest.org dataset) and another highlighting scalability (using the largest public brain imaging dataset available: the UK Biobank dataset).
引用
收藏
相关论文
共 50 条
  • [1] FAIRly big: A framework for computationally reproducible processing of large-scale data
    Wagner, Adina S.
    Waite, Laura K.
    Wierzba, Malgorzata
    Hoffstaedter, Felix
    Waite, Alexander Q.
    Poldrack, Benjamin
    Eickhoff, Simon B.
    Hanke, Michael
    [J]. SCIENTIFIC DATA, 2022, 9 (01)
  • [2] Marbor: A Novel Large-Scale Graph Data Storage and Processing Framework
    Zhou, Wei
    Gao, Yun
    Han, Jizhong
    Xu, Zhiyong
    [J]. 2014 IEEE INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2014,
  • [3] Turbo: Efficient Communication Framework for Large-scale Data Processing Cluster
    Jia, Xuya
    Yao, Zhiyi
    Peng, Chao
    Zhao, Zihao
    Lei, Bin
    Liu, Edison
    Li, Xiang
    He, Zekun
    Wang, Yachen
    Zou, Xianneng
    Zhao, Chongqing
    Chu, Jinhui
    Wang, Jilong
    Miao, Congcong
    [J]. PROCEEDINGS OF THE 2024 ACM SIGCOMM 2024 CONFERENCE, ACM SIGCOMM 2024, 2024, : 540 - 553
  • [4] GeoSpark: A Cluster Computing Framework for Processing Large-Scale Spatial Data
    Yu, Jia
    Wu, Jinxuan
    Sarwat, Mohamed
    [J]. 23RD ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2015), 2015,
  • [5] A Framework of Modeling Large-Scale Wireless Sensor Networks for Big Data Collection
    Djedouboum, Asside Christian
    Ari, Ado Adamou Abba
    Gueroui, Abdelhak Mourad
    Mohamadou, Alidou
    Thiare, Ousmane
    Aliouat, Zibouda
    [J]. SYMMETRY-BASEL, 2020, 12 (07):
  • [6] Spark-Based Large-Scale Matrix Inversion for Big Data Processing
    Liu, Jun
    Liang, Yang
    Ansari, Nirwan
    [J]. IEEE ACCESS, 2016, 4 : 2166 - 2176
  • [7] Spark-based Large-scale Matrix Inversion for Big Data Processing
    Liang, Yang
    Liu, Jun
    Fang, Cheng
    Ansari, Nirwan
    [J]. 2016 IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2016,
  • [8] On the Large-scale Graph Data Processing for User Interface Testing in Big Data Science Projects
    Uygun, Yasin
    Oguz, Ramazan Faruk
    Olmezogullari, Erdi
    Aktas, Mehmet S.
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 2049 - 2056
  • [9] A general-purpose framework for parallel processing of large-scale LiDAR data
    Li, Zhenlong
    Hodgson, Michael E.
    Li, Wenwen
    [J]. INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2018, 11 (01) : 26 - 47
  • [10] Towards a framework for large-scale multimedia data storage and processing on Hadoop platform
    Wei Kuang Lai
    Yi-Uan Chen
    Tin-Yu Wu
    Mohammad S. Obaidat
    [J]. The Journal of Supercomputing, 2014, 68 : 488 - 507