Scalable Maximal Discernibility Discretization for Big Data

被引:2
|
作者
Czolombitko, Michal [1 ]
Stepaniuk, Jaroslaw [1 ]
机构
[1] Bialystok Tech Univ, Fac Comp Sci, Wiejska 45A, PL-15351 Bialystok, Poland
来源
ROUGH SETS | 2017年 / 10313卷
关键词
Discretization of attributes; Rough sets; Apache Spark; ALGORITHM;
D O I
10.1007/978-3-319-60837-2_51
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Discretization of numerical (continuous) attributes is one of the most important data preprocessing tasks in knowledge discovery and data mining. Some of data mining techniques require discretized data. The article aim is to demonstrate that discretization methods based on the discernibility measure to evaluate cuts can be parallelized in Big Data platform Apache Spark. We thus propose a distributed implementation of one of the most well-known discretizers based on rough set methodology. The experimental results in terms of scalability, speedup and sizeup are quite promising.
引用
收藏
页码:644 / 654
页数:11
相关论文
共 50 条
  • [1] Maximal Discernibility Discretization of Attributes-A FPGA Approach
    Kopczynski, Maciej
    Grzes, Tomasz
    Stepaniuk, Jaroslaw
    MACHINE INTELLIGENCE AND BIG DATA IN INDUSTRY, 2016, 19 : 171 - 180
  • [2] Data discretization: taxonomy and big data challenge
    Ramirez-Gallego, Sergio
    Garcia, Salvador
    Mourino-Talin, Hector
    Martinez-Rego, David
    Bolon-Canedo, Veronica
    Alonso-Betanzos, Amparo
    Manuel Benitez, Jose
    Herrera, Francisco
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2016, 6 (01) : 5 - 21
  • [3] Parabolic Threshold Discretization for Big Data
    Lounes, Naima
    Remil, Zakaria
    Oudghiri, Houria
    Chalal, Rachid
    Hidouci, Walid-Khaled
    INFORMATION SYSTEMS AND TECHNOLOGIES, WORLDCIST 2022, VOL 1, 2022, 468 : 66 - 74
  • [4] Scalable data summarization on big data
    Feifei Li
    Suman Nath
    Distributed and Parallel Databases, 2014, 32 : 313 - 314
  • [5] Scalable data summarization on big data
    Li, Feifei
    Nath, Suman
    DISTRIBUTED AND PARALLEL DATABASES, 2014, 32 (03) : 313 - 314
  • [6] Scalable Mining of Big Data
    Leung, Carson K.
    Pazdor, Adam G. M.
    Zheng, Hao
    2021 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, INTERNET OF PEOPLE, AND SMART CITY INNOVATIONS (SMARTWORLD/SCALCOM/UIC/ATC/IOP/SCI 2021), 2021, : 240 - 247
  • [7] Feature selection based on maximal neighborhood discernibility
    Changzhong Wang
    Qiang He
    Mingwen Shao
    Qinghua Hu
    International Journal of Machine Learning and Cybernetics, 2018, 9 : 1929 - 1940
  • [8] Scalable Euclidean Embedding for Big Data
    Alavi, Zohreh
    Sharma, Sagar
    Zhou, Lu
    Chen, Keke
    2015 IEEE 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, 2015, : 773 - 780
  • [9] A Scalable Big Data Test Framework
    Li, Nan
    Escalona, Anthony
    Guo, Yun
    Offutt, Jeff
    2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST), 2015,
  • [10] Clouds for scalable Big Data processing
    Trunfio, Paolo
    Vlassov, Vladimir
    INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2019, 34 (06) : 629 - 631