A Scalable Feature Selection and Model Updating Approach for Big Data Machine Learning

被引:6
|
作者
Yang, Baijian [1 ]
Zhang, Tonglin [2 ]
机构
[1] Purdue Univ, Dept Comp & Informat Technol, W Lafayette, IN 47907 USA
[2] Purdue Univ, Dept Stat, W Lafayette, IN 47907 USA
关键词
D O I
10.1109/SmartCloud.2016.32
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we proposed an innovative approach for feature selection and model updating in big data machine learning. Since hard drive access is the biggest barrier for big data problems, it is therefore nature to reduce disk I/O operations when evaluating different combinations of features, or updating a learning machine. Particularly, we are interested in discovering if small enough matrices exist to represent a system and if the calculation of such matrices can be achieved in a row-by-row fashion to avoid read data from hard drive over and over again. We examined the case of linear regression and proved that arrays of sufficient statistics can be used for feature selection and model updating. Algorithms were designed to compute the arrays in both single processor and MapReduce fashion. The proposed approach can reduce the memory requirement down to O(p(2)), where p is the number of variables in the data set. Simulation results also demonstrated the effectiveness of the algorithms with major computation improvements.
引用
收藏
页码:146 / 151
页数:6
相关论文
共 50 条
  • [1] Scalable Machine Learning with Granulated Data Summaries: A Case of Feature Selection
    Chadzynska-Krasowska, Agnieszka
    Betlinski, PaweL
    Slezak, Dominik
    [J]. FOUNDATIONS OF INTELLIGENT SYSTEMS, ISMIS 2017, 2017, 10352 : 519 - 529
  • [2] Study on Feature Selection and Feature Deep Learning Model For Big Data
    Yu, Ping
    Yan, Hui
    [J]. 2018 3RD INTERNATIONAL CONFERENCE ON SMART CITY AND SYSTEMS ENGINEERING (ICSCSE), 2018, : 792 - 795
  • [3] Scalable and Accurate Online Feature Selection for Big Data
    Yu, Kui
    Wu, Xindong
    Ding, Wei
    Pei, Jian
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2016, 11 (02)
  • [4] Towards Scalable and Accurate Online Feature Selection for Big Data
    Yu, Kui
    Wu, Xindong
    Ding, Wei
    Pei, Jian
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2014, : 660 - 669
  • [5] Data Cleansing Meets Feature Selection: A Supervised Machine Learning Approach
    Tallon-Ballesteros, Antonio J.
    Riquelme, Jose C.
    [J]. BIOINSPIRED COMPUTATION IN ARTIFICIAL SYSTEMS, PT II, 2015, 9108 : 369 - 378
  • [6] Data Classification Using Feature Selection And kNN Machine Learning Approach
    Begum, Shemim
    Chakraborty, Debasis
    Sarkar, Ram
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 811 - 814
  • [7] Scalable malware detection system using big data and distributed machine learning approach
    Manish Kumar
    [J]. Soft Computing, 2022, 26 : 3987 - 4003
  • [8] Scalable malware detection system using big data and distributed machine learning approach
    Kumar, Manish
    [J]. SOFT COMPUTING, 2022, 26 (08) : 3987 - 4003
  • [9] Feature Selection of Photoplethysmograph Data in Machine Learning
    Haq, Faris Atoil
    Sarno, Riyanarto
    Abdillah, Rifqi
    Amri, Taufiq Choirul
    Septiyanto, Abdullah Faqih
    Sungkono, Kelly Rossa
    [J]. 2023 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION, ICAIIC, 2023, : 315 - 320
  • [10] An online approach for feature selection for classification in big data
    Nazar, Nasrin Banu
    Senthilkumar, Radha
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2017, 25 (01) : 163 - 171