Scalable Maximal Discernibility Discretization for Big Data

被引：2

作者：

Czolombitko, Michal ^{[1
]}

Stepaniuk, Jaroslaw ^{[1
]}

机构：

[1] Bialystok Tech Univ, Fac Comp Sci, Wiejska 45A, PL-15351 Bialystok, Poland

来源：

ROUGH SETS | 2017年 / 10313卷

关键词：

Discretization of attributes; Rough sets; Apache Spark; ALGORITHM;

D O I：

10.1007/978-3-319-60837-2_51

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Discretization of numerical (continuous) attributes is one of the most important data preprocessing tasks in knowledge discovery and data mining. Some of data mining techniques require discretized data. The article aim is to demonstrate that discretization methods based on the discernibility measure to evaluate cuts can be parallelized in Big Data platform Apache Spark. We thus propose a distributed implementation of one of the most well-known discretizers based on rough set methodology. The experimental results in terms of scalability, speedup and sizeup are quite promising.

引用

页码：644 / 654

页数：11

共 50 条

[1] Maximal Discernibility Discretization of Attributes-A FPGA Approach
Kopczynski, Maciej
Grzes, Tomasz
Stepaniuk, Jaroslaw
MACHINE INTELLIGENCE AND BIG DATA IN INDUSTRY, 2016, 19 : 171 - 180
[2] Data discretization: taxonomy and big data challenge
Ramirez-Gallego, Sergio
Garcia, Salvador
Mourino-Talin, Hector
Martinez-Rego, David
Bolon-Canedo, Veronica
Alonso-Betanzos, Amparo
Manuel Benitez, Jose
Herrera, Francisco
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2016, 6 (01) : 5 - 21
[3] Parabolic Threshold Discretization for Big Data
Lounes, Naima
Remil, Zakaria
Oudghiri, Houria
Chalal, Rachid
Hidouci, Walid-Khaled
INFORMATION SYSTEMS AND TECHNOLOGIES, WORLDCIST 2022, VOL 1, 2022, 468 : 66 - 74
[4] Scalable data summarization on big data
Feifei Li
Suman Nath
Distributed and Parallel Databases, 2014, 32 : 313 - 314
[5] Scalable data summarization on big data
Li, Feifei
Nath, Suman
DISTRIBUTED AND PARALLEL DATABASES, 2014, 32 (03) : 313 - 314
[6] Scalable Mining of Big Data
Leung, Carson K.
Pazdor, Adam G. M.
Zheng, Hao
2021 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, INTERNET OF PEOPLE, AND SMART CITY INNOVATIONS (SMARTWORLD/SCALCOM/UIC/ATC/IOP/SCI 2021), 2021, : 240 - 247
[7] Feature selection based on maximal neighborhood discernibility
Changzhong Wang
Qiang He
Mingwen Shao
Qinghua Hu
International Journal of Machine Learning and Cybernetics, 2018, 9 : 1929 - 1940
[8] Scalable Euclidean Embedding for Big Data
Alavi, Zohreh
Sharma, Sagar
Zhou, Lu
Chen, Keke
2015 IEEE 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, 2015, : 773 - 780
[9] A Scalable Big Data Test Framework
Li, Nan
Escalona, Anthony
Guo, Yun
Offutt, Jeff
2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST), 2015,
[10] Clouds for scalable Big Data processing
Trunfio, Paolo
Vlassov, Vladimir
INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2019, 34 (06) : 629 - 631

← 1 2 3 4 5 →