A Distributed Integrated Feature Selection Scheme for Column Subset Selection

被引:5
|
作者
Xiao, Zheng [1 ]
Wei, PengCheng [1 ]
Chronopoulos, Anthony Theodore [2 ,3 ]
Elster, Anne C. C. [4 ]
机构
[1] Hunan Univ, Coll Informat Sci & Engn, Changsha 410082, Hunan, Peoples R China
[2] Univ Texas San Antonio, Dept Comp Sci, San Antonio, TX 78249 USA
[3] Univ Patras, Dept Comp Engn & Informat, Rion 26500, Greece
[4] Norwegian Univ Sci & Technol, Dept Comp Sci, N-7491 Trondheim, Norway
基金
中国国家自然科学基金;
关键词
Feature extraction; Distributed databases; Measurement; Approximation algorithms; Optimized production technology; Information filters; Filtering algorithms; Feature selection; distributed integrated scheme; subset quality evaluation; column subset selection; PCA;
D O I
10.1109/TKDE.2021.3108146
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The emergence of computer applications often encounter huge volumes of data which need to be stored and processed in a distributed way. Most of the existing distributed feature selection schemes neglect how good the subsets are that are mapped to the computational nodes, which causes a waste of time and hardware resources. In this paper, we propose a distributed integrated feature selection scheme (DIFS) with Subset Quality Evaluation (SQE). SQE studies the relevance between the quality of a subset and the number of selected features from this subset, which helps shorten the feature selection time efficiently. Feature selection algorithms used in our method and the evaluation metric used in SQE are integrable. Then, we have given the implementation of our scheme for the Column Subset Selection (CSS) problem. More specifically, we integrate a CSS algorithm in DIFS and information entropy as the SQE metric. Theoretically, we prove that the speedup of DIFS can reach m3 compared to the centralized algorithm in ideal situations where $m$m is the number of computational nodes, and give a well bounded approximation guarantee of the solution generated by scheme for CSS problem. Extensive experiments on eight data sets are used to verify the performance of scheme. Experiments results demonstrate the effectiveness of SQE and the impressive speedup DIFS can achieve. Although there is a slight increase of the reconstruction error value in some situations. Additional experiments of classification tasks reveal that the performance of DIFS is better than existing state-of-the-art distributed algorithms.
引用
收藏
页码:2193 / 2205
页数:13
相关论文
共 50 条
  • [1] Distributed Column Subset Selection on MapReduce
    Farahat, Ahmed K.
    Elgohary, Ahmed
    Ghodsi, Ali
    Kamel, Mohamed S.
    [J]. 2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2013, : 171 - 180
  • [2] Streaming and Distributed Algorithms for Robust Column Subset Selection
    Jiang, Shuli
    Li, Dongyu
    Li, Irene Mengze
    Mahankali, Arvind, V
    Woodruff, David P.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [3] An Integrated Feature Selection and Classification Scheme
    Peng, Yi
    Kou, Gang
    Ergu, Daji
    Wu, Wenshuai
    Shi, Yong
    [J]. STUDIES IN INFORMATICS AND CONTROL, 2012, 21 (03): : 241 - 248
  • [4] Greedy Column Subset Selection: New Bounds and Distributed Algorithms
    Altschuler, Jason
    Bhaskara, Aditya
    Fu, Gang
    Mirrokni, Vahab
    Rostamizadeh, Afshin
    Zadimoghaddam, Morteza
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [5] Iterative column subset selection
    Bruno Ordozgoiti
    Sandra Gómez Canaval
    Alberto Mozo
    [J]. Knowledge and Information Systems, 2018, 54 : 65 - 94
  • [6] Iterative column subset selection
    Ordozgoiti, Bruno
    Gomez Canaval, Sandra
    Mozo, Alberto
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 54 (01) : 65 - 94
  • [7] A Note on Column Subset Selection
    Youssef, Pierre
    [J]. INTERNATIONAL MATHEMATICS RESEARCH NOTICES, 2014, 2014 (23) : 6431 - 6447
  • [8] A distributed feature selection scheme with partial information sharing
    Aida Brankovic
    Luigi Piroddi
    [J]. Machine Learning, 2019, 108 : 2009 - 2034
  • [9] A distributed feature selection scheme with partial information sharing
    Brankovic, Aida
    Piroddi, Luigi
    [J]. MACHINE LEARNING, 2019, 108 (11) : 2009 - 2034
  • [10] Feature transformation and subset selection
    Liu, H
    Motoda, H
    [J]. IEEE INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1998, 13 (02): : 26 - 28