Data quality for federated medical data lakes

被引:7
|
作者
Eder, Johann [1 ]
Shekhovtsov, Vladimir A. [1 ]
机构
[1] Univ Klagenfurt, Klagenfurt, Austria
关键词
Biobank; Metadata; Data quality; Data lake; Privacy; LOINC; Metadata and ontologies; INFORMATION-SYSTEMS; HEALTH-CARE; IMPLEMENTATION; INTEGRATION; BIOBANKS;
D O I
10.1108/IJWIS-03-2021-0026
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Purpose Medical research requires biological material and data collected through biobanks in reliable processes with quality assurance. Medical studies based on data with unknown or questionable quality are useless or even dangerous, as evidenced by recent examples of withdrawn studies. Medical data sets consist of highly sensitive personal data, which has to be protected carefully and is available for research only after the approval of ethics committees. The purpose of this research is to propose an architecture to support researchers to efficiently and effectively identify relevant collections of material and data with documented quality for their research projects while observing strict privacy rules. Design/methodology/approach Following a design science approach, this paper develops a conceptual model for capturing and relating metadata of medical data in biobanks to support medical research. Findings This study describes the landscape of biobanks as federated medical data lakes such as the collections of samples and their annotations in the European federation of biobanks (Biobanking and Biomolecular Resources Research Infrastructure - European Research Infrastructure Consortium, BBMRI-ERIC) and develops a conceptual model capturing schema information with quality annotation. This paper discusses the quality dimensions for data sets for medical research in-depth and proposes representations of both the metadata and data quality documentation with the aim to support researchers to effectively and efficiently identify suitable data sets for medical studies. Originality/value This novel conceptual model for metadata for medical data lakes has a unique focus on the high privacy requirements of the data sets contained in medical data lakes and also stands out in the detailed representation of data quality and metadata quality of medical data sets.
引用
收藏
页码:407 / 426
页数:20
相关论文
共 50 条
  • [41] Effective data quality management for electronic medical record data using SMART DATA
    Lee, Seunghee
    Roh, Gyun-Ho
    Kim, Jong-Yeup
    Lee, Young Ho
    Woo, Hyekyung
    Lee, Suehyun
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2023, 180
  • [42] A Game-Theoretic Federated Learning Framework for Data Quality Improvement
    Zhang, Lefeng
    Zhu, Tianqing
    Xiong, Ping
    Zhou, Wanlei
    Yu, Philip S.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (11) : 10952 - 10966
  • [43] Federated Multi-view Learning for Private Medical Data Integration and Analysis
    Che, Sicong
    Kong, Zhaoming
    Peng, Hao
    Sun, Lichao
    Leow, Alex
    Chen, Yong
    He, Lifang
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2022, 13 (04)
  • [44] ExDRa: Exploratory Data Science on Federated Raw Data
    Baunsgaard, Sebastian
    Boehm, Matthias
    Chaudhary, Ankit
    Derakhshan, Behrouz
    Geisselsoeder, Stefan
    Grulich, Philipp M.
    Hildebrand, Michael
    Innerebner, Kevin
    Markl, Volker
    Neubauer, Claus
    Osterburg, Sarah
    Ovcharenko, Olga
    Redyuk, Sergey
    Rieger, Tobias
    Mahdiraji, Alireza Rezaei
    Wrede, Sebastian Benjamin
    Zeuch, Steffen
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 2450 - 2463
  • [45] Personalized Retrogress-Resilient Federated Learning Toward Imbalanced Medical Data
    Chen, Zhen
    Yang, Chen
    Zhu, Meilu
    Peng, Zhe
    Yuan, Yixuan
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2022, 41 (12) : 3663 - 3674
  • [46] An asynchronous federated learning-assisted data sharing method for medical blockchain
    Chenquan Gan
    Xinghai Xiao
    Yiye Zhang
    Qingyi Zhu
    Jichao Bi
    Deepak Kumar Jain
    Akanksha Saini
    Jain, Deepak Kumar (dkj@ieee.org), 2025, 55 (02)
  • [47] A fine-grained medical data sharing scheme based on federated learning
    Liu, Wei
    Zhang, Ying-Hui
    Li, Yi-Fei
    Zheng, Dong
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (20):
  • [48] Practical Challenges in Differentially-Private Federated Survival Analysis of Medical Data
    Rahimian, Shadi
    Kerkouche, Raouf
    Kurth, Ina
    Fritz, Mario
    CONFERENCE ON HEALTH, INFERENCE, AND LEARNING, VOL 174, 2022, 174 : 411 - 425
  • [49] Federated Query processing for Big Data in Data Science
    Muniswamaiah, Manoj
    Agerwala, Tilak
    Tappert, Charles C.
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 6145 - 6147
  • [50] Systematic assessment and improvement of medical data quality
    Jacke, C. O.
    Kalder, M.
    Koller, M.
    Wagner, U.
    Albert, U. S.
    BUNDESGESUNDHEITSBLATT-GESUNDHEITSFORSCHUNG-GESUNDHEITSSCHUTZ, 2012, 55 (11-12) : 1495 - 1503