Parallel Incremental Dynamic Attribute Reduction Algorithm Based on Attribute Tree

被引:0
|
作者
Qin T. [1 ]
Ding W. [1 ]
Ju H. [1 ]
Li M. [1 ]
Huang J. [1 ]
Chen Y. [1 ]
Wang H. [1 ]
机构
[1] School of Information Science and Technology, Nantong University, Nantong
基金
中国博士后科学基金; 俄罗斯科学基金会; 中国国家自然科学基金;
关键词
Attribute Reduction; Attribute Tree; Incremental Learning; Knowledge Granularity; Parallel Computing; Spark Framework;
D O I
10.16451/j.cnki.issn1003-6059.202210007
中图分类号
学科分类号
摘要
Traditional incremental methods mainly focus on the attribute reduction from the perspective of updating approximation. However, while processing large-scale data sets, the methods need to evaluate all attributes and calculate importance repeatedly. Thus, time complexity is increased and efficiency is decreased. To solve the problems, an incremental acceleration strategy for parallelization based on attribute tree is proposed. The key step is to cluster all attributes into multiple attribute trees for parallel dynamic attribute evaluation. Firstly, an appropriate attribute tree is selected for attribute evaluation according to the attribute tree correlation measure to reduce the time complexity. Then, the branch coefficient is added to the stop criterion, and the dynamic increase is conducted with the increase of the branch depth. Consequently, the algorithm can jump out of the cycle automatically after reaching the maximum threshold to avoid the original redundant calculation and improve the efficiency effectively. Based on the above improvements, an incremental dynamic attribute reduction algorithm based on attribute tree is proposed, and a parallel incremental dynamic attribute reduction algorithm based on attribute tree is designed by being combined with Spark parallel mechanism. Finally, experiments on multiple datasets show that the proposed algorithm improves the search efficiency of dynamically variational dataset reduction significantly while maintaining the classification performance, holding a better performance advantage. © 2022 Journal of Pattern Recognition and Artificial Intelligence. All rights reserved.
引用
收藏
页码:939 / 951
页数:12
相关论文
共 43 条
  • [1] CHEN M, MAO S W, LIU Y H., Big Data: A Survey, Mobile Network and Applications, 19, 2, pp. 171-209, (2014)
  • [2] TANG J J, LIANG J, ZHANG S, Et al., Inferring Driving Trajectories Based on Probabilistic Model from Large Scale Taxi GPS Data, Physica A(Statistical Mechanics and Its Applications), 506, pp. 566-577, (2018)
  • [3] HAN L X, LIEW C S, VAN HEMERT J, Et al., A Generic Parallel Processing Model for Facilitating Data Mining and Integration, Parallel Computing, 37, 3, pp. 157-171, (2011)
  • [4] SRINIVASAN A, FARUQUIE T A, JOSHI S., Data and Task Parallelism in ILP Using MapReduce, Machine Learning, 86, 1, pp. 141-168, (2012)
  • [5] PAWLAK Z., Rough Sets, International Journal of Computer and Information Science, 11, 5, pp. 341-356, (1982)
  • [6] HASSANIEN A E, ABRAHAM A, PETERS J F, Et al., Rough Sets and Near Sets in Medical Imaging: A Review, IEEE Transactions on Information Technology in Biomedicine, 13, 6, pp. 955-968, (2009)
  • [7] HERAWAN T, DERIS M M, ABAWAJY J H., A Rough Set Approach for Selecting Clustering Attribute, Knowledge-Based Systems, 23, 3, pp. 220-231, (2010)
  • [8] SWINIARSKI R W, SKOWRON A., Rough Set Methods in Feature Selection and Recognition, Pattern Recognition Letters, 24, 6, pp. 833-849, (2003)
  • [9] LINGRAS P, CHEN M, MIAO D Q., Rough Cluster Quality Index Based on Decision Theory, IEEE Transactions on Knowledge and Data Engineering, 21, 7, pp. 1014-1026, (2009)
  • [10] RIZA L S, JANUSZ A, BERGMEIR C, Et al., Implementing Algorithms of Rough Set Theory and Fuzzy Rough Set Theory in the R Package " RoughSets, Information Sciences, 287, pp. 68-89, (2014)