A Big Data Analysis Platform for Healthcare on Apache Spark

被引:0
|
作者
Zhang, Jinwei [1 ]
Zhang, Yong [1 ]
Hu, Qingcheng [1 ]
Tian, Hongliang [1 ]
Xing, Chunxiao [1 ]
机构
[1] Tsinghua Univ, Tsinghua Natl Lab Informat Sci & Technol, Dept Comp Sci & Technol, Res Inst Informat Technol, Beijing 100084, Peoples R China
来源
SMART HEALTH, ICSH 2016 | 2017年 / 10219卷
关键词
Healthcare analysis platform; Cloud computing; Disease prediction; Apache Spark; Big data;
D O I
10.1007/978-3-319-59858-1_4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, Data Mining techniques such as classification, clustering, association, regression etc. are widely used in healthcare field to help analyzing and predicting disease and improving the quality and efficiency of medical services. This paper presents a web-based platform for big data analysis of healthcare using Data Mining techniques. The platform consists of three main layers: Apache Spark Layer, Workflow Layer and Web Service Layer. Apache Spark Layer provides basic Apache Spark functionalities as regular Resilient Distributed Datasets (RDD) operations. Meanwhile, this layer provides a cache mechanism to maximize the use of the results as much as possible which were calculated before. Workflow Layer encapsulates a variety of nodes for Data Mining, which have different roles such as data source, algorithm model or evaluation tool. These nodes can be organized into a workflow which is a directed acyclic graph (DAG), and then it will be submitted to Apache Spark Layer to execute. And we have implemented many models including Naive B ayes model, Decision Tree model and Logistic Regression model etc. for healthcare big data. Web Service Layer implements rich restful API including data uploading, workflow composition and analysis task submission. We also provide a web graphical interface for the user. Through the interface users can achieve efficient Data Mining without any programming which can greatly help the medical staff who don't understand programming to diagnose the patients' condition more accurately and efficiently.
引用
收藏
页码:32 / 43
页数:12
相关论文
共 50 条
  • [41] Applying Apache Spark on Streaming Big Data for Health Status Prediction
    Ebada, Ahmed Ismail
    Elhenawy, Ibrahim
    Jeong, Chang-Won
    Nam, Yunyoung
    Elbakry, Hazem
    Abdelrazek, Samir
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (02): : 3511 - 3527
  • [42] Performance Prediction for Apache Spark Platform
    Wang, Kewen
    Khan, Mohammad Maifi Hasan
    [J]. 2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS), 2015, : 166 - 173
  • [43] Big data processing with Apache Spark in university institutions: spark streaming and machine learning algorithm
    Boachie, Emmanuel
    Li, Chunlin
    [J]. INTERNATIONAL JOURNAL OF CONTINUING ENGINEERING EDUCATION AND LIFE-LONG LEARNING, 2019, 29 (1-2) : 5 - 20
  • [44] Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine Learning
    Alotaibi, Shoayee
    Mehmood, Rashid
    Katib, Iyad
    Rana, Omer
    Albeshri, Aiiad
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (04):
  • [45] A Platform for Interactive Data Science with Apache Spark for On-premises Infrastructure
    Lokuciejewski, Rafal
    Schuessele, Dominik
    Wilhelm, Florian
    Groppe, Sven
    [J]. CLOSER: PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, 2021, : 65 - 76
  • [46] Interactive Big Data Management in Healthcare Using Spark
    Archenaa, J.
    Anita, E. A. Mary
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL SYMPOSIUM ON BIG DATA AND CLOUD COMPUTING CHALLENGES (ISBCC - 16'), 2016, 49 : 265 - 272
  • [47] Alignment-free Genomic Analysis via a Big Data Spark Platform
    Petrillo, Umberto Ferraro
    Palini, Francesco
    Cattaneo, Giuseppe
    Giancarlo, Raffaele
    [J]. BIOINFORMATICS, 2021, 37 (12) : 1658 - 1665
  • [48] Application of Improved Recommendation System Based on Spark Platform in Big Data Analysis
    Xie, Li
    Zhou, Wenbo
    Li, Yaosen
    [J]. CYBERNETICS AND INFORMATION TECHNOLOGIES, 2016, 16 (06) : 245 - 255
  • [49] Concept and benchmark results for Big Data energy forecasting based on Apache Spark
    González Ordiano J.Á.
    Bartschat A.
    Ludwig N.
    Braun E.
    Waczowicz S.
    Renkamp N.
    Peter N.
    Düpmeier C.
    Mikut R.
    Hagenmeyer V.
    [J]. Journal of Big Data, 5 (1)
  • [50] A Big Data Framework for Intrusion Detection in Smart Grids Using Apache Spark
    Vimalkumar, K.
    Radhika, N.
    [J]. 2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2017, : 198 - 204