Knowledge process of health big data using MapReduce-based associative mining

被引：17

作者：

Choi, So-Young ^{[1
]}

Chung, Kyungyong ^{[2
]}

机构：

[1] Kyonggi Univ, Dept Comp Sci, Data Min Lab, 154-42 Gwanggyosan Ro, Suwon 16227, Gyeonggi Do, South Korea

[2] Kyonggi Univ, Div Comp Sci & Engn, 154-42 Gwanggyosan ro, Suwon 16227, Gyeonggi Do, South Korea

来源：

PERSONAL AND UBIQUITOUS COMPUTING | 2020年 / 24卷 / 05期

关键词：

Data mining; Knowledge process; Associative mining; Healthcare; MapReduce;

D O I：

10.1007/s00779-019-01230-3

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Big-data knowledge processing technology facilitates efficient health management services by systematically collecting and promoting information using distributed/parallel processing with the health platform's common data model. Thus, it enables knowledge expansion for healthcare data. In this study, we propose a big-data knowledge process for the health industry using Hadoop's MapReduce software for association mining. The proposed method provides efficient health management knowledge services by collecting and processing heterogeneous health information using WebBot and the common data model. Hadoop is a proprietary method of effectively processing distributed big data. It is a knowledge processing model that combines MapReduce-based distributed processing and a method of finding mining-based associations. The input data in MapReduce is extracted from chronic disease nomenclature from health big data. The corpus divides big data into several blocks of a certain size, creating map tasks. Through the map function of the mapper of each map task, <|key|, value> sets composed of pairs of a key and a value are created. In the map process, a key is created using the same method used for a frequent item set of the Apriori algorithm. The key is a set of 2(p)keys and its value is set to the occurrence frequency of the key. By summing up the values of the same keys by combining, the size of data is decreased and the load of a software program is also decreased. In addition, for each key, the reducer is designated through hash partitioning and stored in the reduce task. In the reduce process, the results of the map are allocated to each reducer, and alignment and merge steps are taken based on the keys. For the same |key|, the values are summed up by performing the reduce function. In this instance, keys whose values fail to meet the minimum support criterion are eliminated. Therefore, from a set of <|key|, value>, a frequent item set that meets the minimum support criterion is extracted. The association rules between datasets constituting the frequent item set are determined, and the support and reliability are calculated to examine whether they are actually associated. As the value of the frequent item set is higher, the support and reliability are also higher. Thus means that the association is obvious. A knowledge base is then constructed using the extracted association rules by repeatedly performing the MapReduce process. Closely associated knowledge bases are created and semantically related in real time with high probability. Furthermore, mining-based knowledge processing of health big data infers more meaningful associations between chronic diseases. The proposed method adds technological value and intelligent efficiency to support the health and medical fields promote healthy lives.

引用

页码：571 / 581

页数：11

共 50 条

[21] Parallel Associative Classification Data Mining Frameworks Based MapReduce
Thabtah, Fadi
Hammoud, Suhel
Abdel-Jaber, Hussein
PARALLEL PROCESSING LETTERS, 2015, 25 (02)
[22] MapReduce-based Data Processing on IoT
Satoh, Ichiro
2014 IEEE INTERNATIONAL CONFERENCE (ITHINGS) - 2014 IEEE INTERNATIONAL CONFERENCE ON GREEN COMPUTING AND COMMUNICATIONS (GREENCOM) - 2014 IEEE INTERNATIONAL CONFERENCE ON CYBER-PHYSICAL-SOCIAL COMPUTING (CPS), 2014, : 161 - 168
[23] A MapReduce-based k-Nearest Neighbor Approach for Big Data Classification
Maillo, Jesus
Triguero, Isaac
Herrera, Francisco
2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 2, 2015, : 167 - 172
[24] A MapReduce-Based Approach for Mining Embedded Patterns from Large Tree Data
Zhao, Wen
Wu, Xiaoying
WEB AND BIG DATA (APWEB-WAIM 2018), PT II, 2018, 10988 : 455 - 462
[25] Scaling up MapReduce-based Big Data Processing on Multi-GPU systems
Jiang, Hai
Chen, Yi
Qiao, Zhi
Weng, Tien-Hsiung
Li, Kuan-Ching
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2015, 18 (01): : 369 - 383
[26] MapReduce-Based Complex Big Data Analytics over Uncertain and Imprecise Social Networks
Braun, Peter
Cuzzocrea, Alfredo
Jiang, Fan
Leung, Carson Kai-Sang
Pazdor, Adam G. M.
BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2017, 2017, 10440 : 130 - 145
[27] MapReduce-Based D_ELT Framework to Address the Challenges of Geospatial Big Data
Jo, Junghee
Lee, Kang-Woo
ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2019, 8 (11)
[28] Scaling up MapReduce-based Big Data Processing on Multi-GPU systems
Hai Jiang
Yi Chen
Zhi Qiao
Tien-Hsiung Weng
Kuan-Ching Li
Cluster Computing, 2015, 18 : 369 - 383
[29] LandQυ2: A MapReduce-Based System for Processing Arable Land Quality Big Data
Yao, Xiaochuang
Mokbel, Mohamed E.
Ye, Sijing
Li, Guoqing
Alarabi, Louai
Eldawy, Ahmed
Zhao, Zuliang
Zhao, Long
Zhu, Dehai
ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2018, 7 (07)
[30] PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining
Mao Yimin
Geng Junhao
Deborah Simon Mwakapesa
Yaser Ahangari Nanehkaran
Zhang Chi
Deng Xiaoheng
Chen Zhigang
Multimedia Systems, 2021, 27 : 709 - 722

← 1 2 3 4 5 →