MapReduce based parallel attribute reduction in Incomplete Decision Systems

被引:6
|
作者
Sowkuntla, Pandu [2 ]
Dunna, Sravya [1 ]
Prasad, P. S. V. S. Sai [2 ]
机构
[1] OC Tanner India Pvt Ltd, Hyderabad 500032, Telangana, India
[2] Univ Hyderabad, Sch Comp & Informat Sci, Hyderabad 500046, Telangana, India
关键词
Attribute reduction; Incomplete decision systems; MapReduce; Rough set theory; Vertical partitioning; Horizontal partitioning; FEATURE-SELECTION; ROUGH; ALGORITHMS;
D O I
10.1016/j.knosys.2020.106677
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The scale of the data collected today from applications in the real-world is massive. Sometimes this data can also include missing (incomplete) values that give rise to large-scale incomplete decision systems (IDS). Parallel attribute reduction in big data is an essential preprocessing step for scalable machine learning model construction. Rough set theory has been used as a powerful tool for attribute reduction in complete decision systems (CDS). Furthermore extensions to classical rough set theory have been proposed to deal with IDS. A lot of research works have been done on efficient attribute reduction in IDS using these extensions, but no parallel/distributed approaches have been proposed for attribute reduction in large-scale IDS. Since, owing to its two challenges, large-scale and incompleteness, the processing of large-scale IDS is difficult. To address these challenges, we propose MapReduce based parallel/distributed approaches for attribute reduction in massive IDS. The proposed approaches resolve the challenge of incompleteness with the existing Novel Granular Framework (NGF). And each proposed approach follows a different data partitioning strategy to handle the data sets that are large-scale in terms of number of objects and attributes. One of the proposed approaches adopts an alternative representation of the NGF and uses a horizontal partitioning (division in object space) of the data to the nodes of cluster. Another approach embraces the existing NGF and uses a vertical partitioning (division in attribute space) of the data. Extensive experimental analysis carried out on various data sets with different percentages of incompleteness in the data. The experimental results show that the horizontal partitioning based approach performs well for the massive object space data sets. And the vertical partitioning based approach is relevant and scales well for extremely high dimensional data sets. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Attribute Reduction in Incomplete Information Systems
    Sun, Shibuo
    Duan, Jianhui
    Wanyan, Dandan
    [J]. ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PT III, 2011, 7004 : 587 - 594
  • [22] Attribute Reduction based on Generalized Similarity Relation in Incomplete Decision System
    Yang Jilin
    Qin Keyun
    Du Weifeng
    [J]. ISIP: 2009 INTERNATIONAL SYMPOSIUM ON INFORMATION PROCESSING, PROCEEDINGS, 2009, : 63 - +
  • [23] Analysis of Attribute Reduction of Incomplete Decision Table Based on Information Entropy
    Du Yue
    Zhang Xu
    Chen Dai-mei
    Wang Yu-mei
    [J]. PROCEEDINGS OF 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTATION TECHNOLOGY AND AUTOMATION (ICICTA 2015), 2015, : 178 - 181
  • [24] MapReduce based parallel fuzzy-rough attribute reduction using discernibility matrix
    Pandu Sowkuntla
    P. S. V. S. Sai Prasad
    [J]. Applied Intelligence, 2022, 52 : 154 - 173
  • [25] MapReduce based parallel fuzzy-rough attribute reduction using discernibility matrix
    Sowkuntla, Pandu
    Prasad, P. S. V. S. Sai
    [J]. APPLIED INTELLIGENCE, 2022, 52 (01) : 154 - 173
  • [26] A fast approach to attribute reduction in incomplete decision systems with tolerance relation-based rough sets
    Meng, Zuqiang
    Shi, Zhongzhi
    [J]. INFORMATION SCIENCES, 2009, 179 (16) : 2774 - 2793
  • [27] Attribute Reduction in Decision Systems Based on Relation Matrix
    ZHONG Cheng~1
    2.School of Science
    [J]. 浙江海洋大学学报(自然科学版), 2010, 29 (05) : 507 - 514
  • [28] Attribute selection based on a new conditional entropy for incomplete decision systems
    Dai, Jianhua
    Wang, Wentao
    Tian, Haowei
    Liu, Liang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2013, 39 : 207 - 213
  • [29] Attribute Reduction on Distributed Incomplete Decision Information System
    Hu, Jun
    Wang, Kai
    Yu, Hong
    [J]. ROUGH SETS, 2017, 10313 : 289 - 305
  • [30] Parallel Attribute Reduction Algorithm for Complex Heterogeneous Data Using MapReduce
    Zhang, Tengfei
    Ma, Fumin
    Cao, Jie
    Peng, Chen
    Yue, Dong
    [J]. COMPLEXITY, 2018,