A Big Data Analysis Platform for Healthcare on Apache Spark

被引:0
|
作者
Zhang, Jinwei [1 ]
Zhang, Yong [1 ]
Hu, Qingcheng [1 ]
Tian, Hongliang [1 ]
Xing, Chunxiao [1 ]
机构
[1] Tsinghua Univ, Tsinghua Natl Lab Informat Sci & Technol, Dept Comp Sci & Technol, Res Inst Informat Technol, Beijing 100084, Peoples R China
来源
SMART HEALTH, ICSH 2016 | 2017年 / 10219卷
关键词
Healthcare analysis platform; Cloud computing; Disease prediction; Apache Spark; Big data;
D O I
10.1007/978-3-319-59858-1_4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, Data Mining techniques such as classification, clustering, association, regression etc. are widely used in healthcare field to help analyzing and predicting disease and improving the quality and efficiency of medical services. This paper presents a web-based platform for big data analysis of healthcare using Data Mining techniques. The platform consists of three main layers: Apache Spark Layer, Workflow Layer and Web Service Layer. Apache Spark Layer provides basic Apache Spark functionalities as regular Resilient Distributed Datasets (RDD) operations. Meanwhile, this layer provides a cache mechanism to maximize the use of the results as much as possible which were calculated before. Workflow Layer encapsulates a variety of nodes for Data Mining, which have different roles such as data source, algorithm model or evaluation tool. These nodes can be organized into a workflow which is a directed acyclic graph (DAG), and then it will be submitted to Apache Spark Layer to execute. And we have implemented many models including Naive B ayes model, Decision Tree model and Logistic Regression model etc. for healthcare big data. Web Service Layer implements rich restful API including data uploading, workflow composition and analysis task submission. We also provide a web graphical interface for the user. Through the interface users can achieve efficient Data Mining without any programming which can greatly help the medical staff who don't understand programming to diagnose the patients' condition more accurately and efficiently.
引用
收藏
页码:32 / 43
页数:12
相关论文
共 50 条
  • [21] Big Data Network Flow Processing Using Apache Spark
    Jerabek, Kamil
    Rysavy, Ondrej
    [J]. PROCEEDINGS OF THE 6TH CONFERENCE ON THE ENGINEERING OF COMPUTER BASED SYSTEMS (ECBS 2019), 2020,
  • [22] Apache Spark Methods and Techniques in Big Data-A Review
    Sahana, H. P.
    Sanjana, M. S.
    Muddasir, N. Mohammed
    Vidyashree, K. P.
    [J]. INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES, ICICCT 2019, 2020, 89 : 721 - 726
  • [23] MaRe: Processing Big Data with application containers on Apache Spark
    Capuccini, Marco
    Dahlo, Martin
    Toor, Salman
    Spjuth, Ola
    [J]. GIGASCIENCE, 2020, 9 (05):
  • [24] Big Data Analytics for the ATLAS EventIndex Project with Apache Spark
    Casani, Alvaro Fernandez
    Montoro, Carlos Garcia
    de la Hoz, Santiago Gonzalez
    Salt, Jose
    Sanchez, Javier
    Perez, Miguel Villaplana
    [J]. COMPUTATIONAL AND MATHEMATICAL METHODS, 2023, 2023
  • [25] SparkJNI: A Toolchain for Hardware Accelerated Big Data Apache Spark
    Voicu, Tudor Alexandru
    Al-Ars, Zaid
    [J]. 2019 4TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (ICBDA 2019), 2019, : 152 - 157
  • [26] Big Data Machine Learning using Apache Spark MLlib
    Assefi, Mehdi
    Behravesh, Ehsun
    Liu, Guangchi
    Tafti, Ahmad P.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 3492 - 3498
  • [27] BigDebug: Interactive Debugger for Big Data Analytics in Apache Spark
    Gulzar, Muhammad Ali
    Interlandi, Matteo
    Condie, Tyson
    Kim, Miryung
    [J]. FSE'16: PROCEEDINGS OF THE 2016 24TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON FOUNDATIONS OF SOFTWARE ENGINEERING, 2016, : 1033 - 1037
  • [28] Classifying Short Unstructured Data Using the Apache Spark Platform
    Castro, Eduardo P. S.
    Chakravarty, Saurabh
    Williamson, Eric
    Pereira, Denilson Alves
    Fox, Edward A.
    [J]. 2017 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL 2017), 2017, : 129 - 138
  • [29] Analysis of Big Data in Healthcare and Life Sciences Using Hive and Spark
    Hanuman, A. Sai
    Soujanya, R.
    Madhuri, P. M.
    [J]. DATA ENGINEERING AND COMMUNICATION TECHNOLOGY, ICDECT-2K19, 2020, 1079 : 825 - 840
  • [30] CMS Analysis and Data Reduction with Apache Spark
    Gutsche, Oliver
    Canali, Luca
    Cremer, Illia
    Cremonesi, Matteo
    Elmer, Peter
    Fisk, Ian
    Girone, Maria
    Jayatilaka, Bo
    Kowalkowski, Jim
    Khristenko, Viktor
    Motesnitsalis, Evangelos
    Pivarski, Jim
    Sehrish, Saba
    Surdy, Kacper
    Svyatkovskiy, Alexey
    [J]. 18TH INTERNATIONAL WORKSHOP ON ADVANCED COMPUTING AND ANALYSIS TECHNIQUES IN PHYSICS RESEARCH (ACAT2017), 2018, 1085