A Big Data Analysis Platform for Healthcare on Apache Spark

被引：0

作者：

Zhang, Jinwei ^{[1
]}

Zhang, Yong ^{[1
]}

Hu, Qingcheng ^{[1
]}

Tian, Hongliang ^{[1
]}

Xing, Chunxiao ^{[1
]}

机构：

[1] Tsinghua Univ, Tsinghua Natl Lab Informat Sci & Technol, Dept Comp Sci & Technol, Res Inst Informat Technol, Beijing 100084, Peoples R China

来源：

SMART HEALTH, ICSH 2016 | 2017年 / 10219卷

关键词：

Healthcare analysis platform; Cloud computing; Disease prediction; Apache Spark; Big data;

D O I：

10.1007/978-3-319-59858-1_4

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In recent years, Data Mining techniques such as classification, clustering, association, regression etc. are widely used in healthcare field to help analyzing and predicting disease and improving the quality and efficiency of medical services. This paper presents a web-based platform for big data analysis of healthcare using Data Mining techniques. The platform consists of three main layers: Apache Spark Layer, Workflow Layer and Web Service Layer. Apache Spark Layer provides basic Apache Spark functionalities as regular Resilient Distributed Datasets (RDD) operations. Meanwhile, this layer provides a cache mechanism to maximize the use of the results as much as possible which were calculated before. Workflow Layer encapsulates a variety of nodes for Data Mining, which have different roles such as data source, algorithm model or evaluation tool. These nodes can be organized into a workflow which is a directed acyclic graph (DAG), and then it will be submitted to Apache Spark Layer to execute. And we have implemented many models including Naive B ayes model, Decision Tree model and Logistic Regression model etc. for healthcare big data. Web Service Layer implements rich restful API including data uploading, workflow composition and analysis task submission. We also provide a web graphical interface for the user. Through the interface users can achieve efficient Data Mining without any programming which can greatly help the medical staff who don't understand programming to diagnose the patients' condition more accurately and efficiently.

引用

页码：32 / 43

页数：12

共 50 条

[21] Big Data Network Flow Processing Using Apache Spark
Jerabek, Kamil
Rysavy, Ondrej
[J]. PROCEEDINGS OF THE 6TH CONFERENCE ON THE ENGINEERING OF COMPUTER BASED SYSTEMS (ECBS 2019), 2020,
[22] Apache Spark Methods and Techniques in Big Data-A Review
Sahana, H. P.
Sanjana, M. S.
Muddasir, N. Mohammed
Vidyashree, K. P.
[J]. INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES, ICICCT 2019, 2020, 89 : 721 - 726
[23] MaRe: Processing Big Data with application containers on Apache Spark
Capuccini, Marco
Dahlo, Martin
Toor, Salman
Spjuth, Ola
[J]. GIGASCIENCE, 2020, 9 (05):
[24] Big Data Analytics for the ATLAS EventIndex Project with Apache Spark
Casani, Alvaro Fernandez
Montoro, Carlos Garcia
de la Hoz, Santiago Gonzalez
Salt, Jose
Sanchez, Javier
Perez, Miguel Villaplana
[J]. COMPUTATIONAL AND MATHEMATICAL METHODS, 2023, 2023
[25] SparkJNI: A Toolchain for Hardware Accelerated Big Data Apache Spark
Voicu, Tudor Alexandru
Al-Ars, Zaid
[J]. 2019 4TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (ICBDA 2019), 2019, : 152 - 157
[26] Big Data Machine Learning using Apache Spark MLlib
Assefi, Mehdi
Behravesh, Ehsun
Liu, Guangchi
Tafti, Ahmad P.
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 3492 - 3498
[27] BigDebug: Interactive Debugger for Big Data Analytics in Apache Spark
Gulzar, Muhammad Ali
Interlandi, Matteo
Condie, Tyson
Kim, Miryung
[J]. FSE'16: PROCEEDINGS OF THE 2016 24TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON FOUNDATIONS OF SOFTWARE ENGINEERING, 2016, : 1033 - 1037
[28] Classifying Short Unstructured Data Using the Apache Spark Platform
Castro, Eduardo P. S.
Chakravarty, Saurabh
Williamson, Eric
Pereira, Denilson Alves
Fox, Edward A.
[J]. 2017 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL 2017), 2017, : 129 - 138
[29] Analysis of Big Data in Healthcare and Life Sciences Using Hive and Spark
Hanuman, A. Sai
Soujanya, R.
Madhuri, P. M.
[J]. DATA ENGINEERING AND COMMUNICATION TECHNOLOGY, ICDECT-2K19, 2020, 1079 : 825 - 840
[30] CMS Analysis and Data Reduction with Apache Spark
Gutsche, Oliver
Canali, Luca
Cremer, Illia
Cremonesi, Matteo
Elmer, Peter
Fisk, Ian
Girone, Maria
Jayatilaka, Bo
Kowalkowski, Jim
Khristenko, Viktor
Motesnitsalis, Evangelos
Pivarski, Jim
Sehrish, Saba
Surdy, Kacper
Svyatkovskiy, Alexey
[J]. 18TH INTERNATIONAL WORKSHOP ON ADVANCED COMPUTING AND ANALYSIS TECHNIQUES IN PHYSICS RESEARCH (ACAT2017), 2018, 1085

← 1 2 3 4 5 →