A synthetic dataset of liver disorder patients

被引:0
|
作者
Nicora, Giovanna [1 ,2 ]
Buonocore, Tommaso Mario [1 ]
Parimbelli, Enea [1 ,3 ]
机构
[1] Univ Pavia, Dept Elect Comp & Biomed Engn, Pavia, Italy
[2] enGenome Srl, Pavia, Italy
[3] Univ Ottawa, Telfer Sch Management, Ottawa, ON, Canada
来源
DATA IN BRIEF | 2023年 / 47卷
关键词
Synthetic patients; Machine learning; Bayesian network; Dataset shift; Causal model;
D O I
10.1016/j.dib.2023.108921
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The data in this article include 10,0 0 0 synthetic patients with liver disorders, characterized by 70 different variables, in-cluding clinical features, and patient outcomes, such as hos-pital admission or surgery. Patient data are generated, sim-ulating as close as possible real patient data, using a pub-licly available Bayesian network describing a casual model for liver disorders. By varying the network parameters, we also generated an additional set of 500 patients with character-istics that deviated from the initial patient population. We provide an overview of the synthetic data generation process and the associated scripts for generating the cohorts. This dataset can be useful for the machine learning models train-ing and validation, especially under the effect of dataset shift between training and testing sets.(c) 2023 Published by Elsevier Inc. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Investigating the Optimal Parameterization of Deep Neural Network and Synthetic Data Workflow for Imbalance Liver Disorder Dataset Classification
    Diana, Nova Eka
    Ahmad, Andi Batari
    Mahardika, Zwasta Pribadi
    RECENT ADVANCES ON SOFT COMPUTING AND DATA MINING (SCDM 2020), 2020, 978 : 88 - 97
  • [2] Novel Work of Diagnosis of Liver Cancer Using Tree Classifier on Liver Cancer Dataset (BUPA Liver Disorder)
    Tiwari, Manish
    Chakrabarti, Prasun
    Chakrabarti, Tulika
    SOFT COMPUTING SYSTEMS, ICSCS 2018, 2018, 837 : 155 - 160
  • [3] Synthetic pulse wave dataset for analysis of vascular ageing in elderly patients
    Rogov, Artem
    Gamilov, Timur
    Bragina, Anna
    Abdullaev, Magomed
    Druzhinina, Natalia
    Rodionova, Yuliya
    Shikhmagomedov, Rustam
    Tyulin, Maksim
    Podzolkov, Valeriy
    MATHEMATICAL MODELLING OF NATURAL PHENOMENA, 2024, 19
  • [4] A synthetic building operation dataset
    Han Li
    Zhe Wang
    Tianzhen Hong
    Scientific Data, 8
  • [5] An Open Dataset of Synthetic Speech
    Yaroshchuk, Artem
    Papastergiopoulos, Christoforos
    Cuccovillo, Luca
    Aichroth, Patrick
    Votis, Konstantinos
    Tzovaras, Dimitrios
    2023 IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY, WIFS, 2023,
  • [6] FoR: A Dataset for Synthetic Speech Detection
    Reimao, Ricardo
    Tzerpos, Vassilios
    2019 10TH INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2019,
  • [7] A synthetic building operation dataset
    Li, Han
    Wang, Zhe
    Hong, Tianzhen
    SCIENTIFIC DATA, 2021, 8 (01)
  • [8] Synthetic dataset for compositional learning
    Molek, Vojtech
    Hula, Jan
    DATA SCIENCE AND KNOWLEDGE ENGINEERING FOR SENSING DECISION SUPPORT, 2018, 11 : 1440 - 1445
  • [9] Weapon Violence Dataset 2.0: A synthetic dataset for violence detection
    Nadeem, Muhammad Shahroz
    Kurugollu, Fatih
    Atlam, Hany F.
    Franqueira, Virginia N. L.
    DATA IN BRIEF, 2024, 54
  • [10] Synthetic Dataset Generation of Driver Telematics
    So, Banghee
    Boucher, Jean-Philippe
    Valdez, Emiliano A.
    RISKS, 2021, 9 (04)