A Big Data Analysis Platform for Healthcare on Apache Spark

被引:0
|
作者
Zhang, Jinwei [1 ]
Zhang, Yong [1 ]
Hu, Qingcheng [1 ]
Tian, Hongliang [1 ]
Xing, Chunxiao [1 ]
机构
[1] Tsinghua Univ, Tsinghua Natl Lab Informat Sci & Technol, Dept Comp Sci & Technol, Res Inst Informat Technol, Beijing 100084, Peoples R China
来源
SMART HEALTH, ICSH 2016 | 2017年 / 10219卷
关键词
Healthcare analysis platform; Cloud computing; Disease prediction; Apache Spark; Big data;
D O I
10.1007/978-3-319-59858-1_4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, Data Mining techniques such as classification, clustering, association, regression etc. are widely used in healthcare field to help analyzing and predicting disease and improving the quality and efficiency of medical services. This paper presents a web-based platform for big data analysis of healthcare using Data Mining techniques. The platform consists of three main layers: Apache Spark Layer, Workflow Layer and Web Service Layer. Apache Spark Layer provides basic Apache Spark functionalities as regular Resilient Distributed Datasets (RDD) operations. Meanwhile, this layer provides a cache mechanism to maximize the use of the results as much as possible which were calculated before. Workflow Layer encapsulates a variety of nodes for Data Mining, which have different roles such as data source, algorithm model or evaluation tool. These nodes can be organized into a workflow which is a directed acyclic graph (DAG), and then it will be submitted to Apache Spark Layer to execute. And we have implemented many models including Naive B ayes model, Decision Tree model and Logistic Regression model etc. for healthcare big data. Web Service Layer implements rich restful API including data uploading, workflow composition and analysis task submission. We also provide a web graphical interface for the user. Through the interface users can achieve efficient Data Mining without any programming which can greatly help the medical staff who don't understand programming to diagnose the patients' condition more accurately and efficiently.
引用
收藏
页码:32 / 43
页数:12
相关论文
共 50 条
  • [1] Apache Spark a Big Data Analytics Platform for Smart Grid
    Shyam, R.
    Ganesh, Bharathi H. B.
    Kumar, Sachin S.
    Poornachandran, Prabaharan
    Soman, K. P.
    [J]. SMART GRID TECHNOLOGIES (ICSGT- 2015), 2015, 21 : 171 - 178
  • [2] Big Data Platform for Oil and Gas Production Based on Apache Spark
    Qing, Peng
    Li, Yi
    Luo, Shuqin
    Xu, Zhuoqun
    [J]. MODERN INDUSTRIAL IOT, BIG DATA AND SUPPLY CHAIN, IIOTBDSC 2020, 2021, 218 : 129 - 141
  • [3] Accelerating Apache Spark Big Data Analysis with FPGAs
    Ghasemi, Ehsan
    Chow, Paul
    [J]. 2016 IEEE 24TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2016, : 94 - 94
  • [4] Accelerating Apache Spark Big Data Analysis with FPGAs
    Ghasemi, Ehsan
    Chow, Paul
    [J]. 2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 737 - 744
  • [5] Social Media Data Processing Infrastructure by Using Apache Spark Big Data Platform: Twitter Data Analysis
    Podhoranyi, Michal
    Vojacek, Lukas
    [J]. 2019 4TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTERNET OF THINGS (CCIOT 2019), 2019, : 1 - 6
  • [6] Big data analytics on Apache Spark
    Salloum S.
    Dautov R.
    Chen X.
    Peng P.X.
    Huang J.Z.
    [J]. International Journal of Data Science and Analytics, 2016, 1 (3-4) : 145 - 164
  • [7] Big Spatial Data Processing With Apache Spark
    Boyi Shangguan
    Peng Yue
    Wu, Zhaoyan
    Jiang, Liangcun
    [J]. 2017 6TH INTERNATIONAL CONFERENCE ON AGRO-GEOINFORMATICS, 2017, : 239 - 242
  • [8] Big Data Software Analytics with Apache Spark
    Gousios, Georgios
    [J]. PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING - COMPANION (ICSE-COMPANION, 2018, : 542 - 543
  • [9] Apache Spark: A Big Data Processing Engine
    Shaikh, Eman
    Mohiuddin, Iman
    Alufaisan, Yasmeen
    Nahvi, Irum
    [J]. 2019 2ND IEEE MIDDLE EAST AND NORTH AFRICA COMMUNICATIONS CONFERENCE (IEEEMENACOMM'19), 2019, : 220 - 225
  • [10] A Big Data Analysis Framework Using Apache Spark and Deep Learning
    Gupta, Anand
    Thakur, Hardeo Kumar
    Shrivastava, Ritvik
    Kumar, Pulkit
    Nag, Sreyashi
    [J]. 2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2017), 2017, : 9 - 16