Streaming feature selection algorithms for big data: A survey

被引:33
|
作者
AlNuaimi, Noura [1 ]
Masud, Mohammad Mehedy [1 ]
Serhani, Mohamed Adel [1 ]
Zaki, Nazar [1 ]
机构
[1] United Arab Emirates Univ, Coll Informat Technol, Al Ain, U Arab Emirates
关键词
Big data; Redundant features; Relevant features; Streaming feature grouping; Streaming feature selection; ONLINE FEATURE-SELECTION; MUTUAL INFORMATION; GRANULATION; RELEVANCE; ENTROPY;
D O I
10.1016/j.aci.2019.01.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Organizations in many domains generate a considerable amount of heterogeneous data every day. Such data can be processed to enhance these organizations' decisions in real time. However, storing and processing large and varied datasets (known as big data) is challenging to do in real time. In machine learning, streaming feature selection has always been considered a superior technique for selecting the relevant subset features from highly dimensional data and thus reducing learning complexity. In the relevant literature, streaming feature selection refers to the features that arrive consecutively over time; despite a lack of exact figure on the number of features, numbers of instances are well-established. Many scholars in the field have proposed streaming-feature-selection algorithms in attempts to find the proper solution to this problem. This paper presents an exhaustive and methodological introduction of these techniques. This study provides a review of the traditional feature-selection algorithms and then scrutinizes the current algorithms that use streaming feature selection to determine their strengths and weaknesses. The survey also sheds light on the ongoing challenges in big-data research.
引用
收藏
页码:113 / 135
页数:23
相关论文
共 50 条
  • [31] An online approach for feature selection for classification in big data
    Nazar, Nasrin Banu
    Senthilkumar, Radha
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2017, 25 (01) : 163 - 171
  • [32] Towards Ultrahigh Dimensional Feature Selection for Big Data
    Tan, Mingkui
    Tsang, Ivor W.
    Wang, Li
    JOURNAL OF MACHINE LEARNING RESEARCH, 2014, 15 : 1371 - 1429
  • [33] Scalable and Accurate Online Feature Selection for Big Data
    Yu, Kui
    Wu, Xindong
    Ding, Wei
    Pei, Jian
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2016, 11 (02)
  • [34] Ensemble with Divisive Bagging for Feature Selection in Big Data
    Park, Yousung
    Kwon, Tae Yeon
    COMPUTATIONAL ECONOMICS, 2024,
  • [35] Distributed Evolutionary Feature Selection for Big Data Processing
    Bouaguel, Waad
    Ben NCir, Chiheb Eddine
    VIETNAM JOURNAL OF COMPUTER SCIENCE, 2022, 09 (03) : 313 - 332
  • [36] Improved Feature Selection Model for Big Data Analytics
    El-Hasnony, Ibrahim M.
    Barakat, Sherif I.
    Elhoseny, Mohamed
    Mostafa, Reham R.
    IEEE ACCESS, 2020, 8 : 66989 - 67004
  • [37] Reducing Data Complexity in Feature Extraction and Feature Selection for Big Data Security Analytics
    Sisiaridis, Dimitrios
    Markowitch, Olivier
    2018 1ST INTERNATIONAL CONFERENCE ON DATA INTELLIGENCE AND SECURITY (ICDIS 2018), 2018, : 43 - 48
  • [38] Feature Selection for Big Visual Data: Overview and Challenges
    Bolon-Canedo, Veronica
    Remeseiro, Beatriz
    Cancela, Brais
    IMAGE ANALYSIS AND RECOGNITION (ICIAR 2018), 2018, 10882 : 136 - 143
  • [39] Data visualization and feature selection: New algorithms for nongaussian data
    Yang, HH
    Moody, J
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 687 - 693
  • [40] Fair streaming feature selection
    Duan, Zhangling
    Li, Tianci
    Ling, Zhaolong
    Wu, Xingyu
    Yang, Jingye
    Jia, Zhaohong
    NEUROCOMPUTING, 2025, 624