A Parallel Matrix-Based Method for Computing Approximations in Incomplete Information Systems

被引:73
|
作者
Zhang, Junbo [1 ,2 ]
Wong, Jian-Syuan [2 ]
Pan, Yi [2 ]
Li, Tianrui [1 ]
机构
[1] Southwest Jiaotong Univ, Sch Informat Sci & Technol, Chengdu 610031, Peoples R China
[2] Georgia State Univ, Dept Comp Sci, Atlanta, GA 30303 USA
基金
美国国家科学基金会;
关键词
Rough sets; data mining; MapReduce; matrix; incomplete information systems; ROUGH SETS; ATTRIBUTE REDUCTION; MAPREDUCE;
D O I
10.1109/TKDE.2014.2330821
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As the volume of data grows at an unprecedented rate, large-scale data mining and knowledge discovery present a tremendous challenge. Rough set theory, which has been used successfully in solving problems in pattern recognition, machine learning, and data mining, centers around the idea that a set of distinct objects may be approximated via a lower and upper bound. In order to obtain the benefits that rough sets can provide for data mining and related tasks, efficient computation of these approximations is vital. The recently introduced cloud computing model, MapReduce, has gained a lot of attention from the scientific community for its applicability to large-scale data analysis. In previous research, we proposed a MapReduce-based method for computing approximations in parallel, which can efficiently process complete data but fails in the case of missing (incomplete) data. To address this shortcoming, three different parallel matrix-based methods are introduced to process large-scale, incomplete data. All of them are built on MapReduce and implemented on Twister that is a lightweight MapReduce runtime system. The proposed parallel methods are then experimentally shown to be efficient for processing large-scale data.
引用
收藏
页码:326 / 339
页数:14
相关论文
共 50 条
  • [1] A Parallel Matrix-Based Approach for Computing Approximations in Dominance-Based Rough Sets Approach
    Li, Shaoyong
    Li, Tianrui
    [J]. ROUGH SETS AND KNOWLEDGE TECHNOLOGY, RSKT 2014, 2014, 8818 : 173 - 183
  • [2] Matrix-based set approximations and reductions in covering decision information systems
    Tan, Anhui
    Li, Jinjin
    Lin, Yaojin
    Lin, Guoping
    [J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2015, 59 : 68 - 80
  • [3] Matrix-based parallel pattern matching method
    Zhang, Hongli
    Xu, Dongliang
    Zhang, Lei
    Sun, Yanbin
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2015, : 7114 - 7119
  • [4] Matrix-based approaches for updating three-way regions in incomplete information systems with the variation of attributes
    Hu, Chengxiang
    Zhang, Li
    Huang, Xiaoling
    Wang, Huibin
    [J]. INFORMATION SCIENCES, 2023, 639
  • [5] Distributed approach for computing rough set approximations of big incomplete information systems
    Hamed, Ahmed
    Sobhy, Ahmed
    Nassar, Hamed
    [J]. INFORMATION SCIENCES, 2021, 547 : 427 - 449
  • [6] Information granules and approximations in incomplete information systems
    Wij, Wei-Zhi
    Yang, Xiao-Ping
    [J]. PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 3740 - 3745
  • [7] A matrix-based method for the structural analysis of diagrid systems
    Lacidogna, Giuseppe
    Scaramozzino, Domenico
    Carpinteri, Alberto
    [J]. ENGINEERING STRUCTURES, 2019, 193 : 340 - 352
  • [8] Incremental Matrix-Based Subspace Method for Matrix-Based Feature Extraction
    Zhang, Zhaoyang
    Sun, Shijie
    Wang, Wei
    [J]. COMPLEXITY, 2020, 2020
  • [9] Approximations and uncertainty measures in incomplete information systems
    Dai, Jianhua
    Xu, Qing
    [J]. INFORMATION SCIENCES, 2012, 198 : 62 - 80
  • [10] A parallel method for computing rough set approximations
    Zhang, Junbo
    Li, Tianrui
    Ruan, Da
    Gao, Zizhe
    Zhao, Chengbing
    [J]. INFORMATION SCIENCES, 2012, 194 : 209 - 223