Streaming feature selection algorithms for big data: A survey

被引:33
|
作者
AlNuaimi, Noura [1 ]
Masud, Mohammad Mehedy [1 ]
Serhani, Mohamed Adel [1 ]
Zaki, Nazar [1 ]
机构
[1] United Arab Emirates Univ, Coll Informat Technol, Al Ain, U Arab Emirates
关键词
Big data; Redundant features; Relevant features; Streaming feature grouping; Streaming feature selection; ONLINE FEATURE-SELECTION; MUTUAL INFORMATION; GRANULATION; RELEVANCE; ENTROPY;
D O I
10.1016/j.aci.2019.01.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Organizations in many domains generate a considerable amount of heterogeneous data every day. Such data can be processed to enhance these organizations' decisions in real time. However, storing and processing large and varied datasets (known as big data) is challenging to do in real time. In machine learning, streaming feature selection has always been considered a superior technique for selecting the relevant subset features from highly dimensional data and thus reducing learning complexity. In the relevant literature, streaming feature selection refers to the features that arrive consecutively over time; despite a lack of exact figure on the number of features, numbers of instances are well-established. Many scholars in the field have proposed streaming-feature-selection algorithms in attempts to find the proper solution to this problem. This paper presents an exhaustive and methodological introduction of these techniques. This study provides a review of the traditional feature-selection algorithms and then scrutinizes the current algorithms that use streaming feature selection to determine their strengths and weaknesses. The survey also sheds light on the ongoing challenges in big-data research.
引用
收藏
页码:113 / 135
页数:23
相关论文
共 50 条
  • [21] Feature Selection Algorithms in Intrusion Detection System: A Survey
    Maza, Sofiane
    Touahria, Mohamed
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2018, 12 (10): : 5079 - 5099
  • [22] Feature Interaction for Streaming Feature Selection
    Zhou, Peng
    Li, Peipei
    Zhao, Shu
    Wu, Xindong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (10) : 4691 - 4702
  • [23] How big is Big Data? A comprehensive survey of data production, storage, and streaming in science and industry
    Clissa, Luca
    Lassnig, Mario
    Rinaldi, Lorenzo
    FRONTIERS IN BIG DATA, 2023, 6
  • [24] Deep Learning for IoT Big Data and Streaming Analytics: A Survey
    Mohammadi, Mehdi
    Al-Fuqaha, Ala
    Sorour, Sameh
    Guizani, Mohsen
    IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2018, 20 (04): : 2923 - 2960
  • [25] Stochastic feature selection with annealing and its applications to streaming data
    Sun, Lizhe
    Barbu, Adrian
    JOURNAL OF NONPARAMETRIC STATISTICS, 2025,
  • [26] Automating Feature Extraction and Feature Selection in Big Data Security Analytics
    Sisiaridis, Dimitrios
    Markowitch, Olivier
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING (ICAISC 2018), PT II, 2018, 10842 : 423 - 432
  • [27] Study on Feature Selection and Feature Deep Learning Model For Big Data
    Yu, Ping
    Yan, Hui
    2018 3RD INTERNATIONAL CONFERENCE ON SMART CITY AND SYSTEMS ENGINEERING (ICSCSE), 2018, : 792 - 795
  • [28] A survey on data-efficient algorithms in big data era
    Adadi, Amina
    JOURNAL OF BIG DATA, 2021, 8 (01)
  • [29] Towards ultrahigh dimensional feature selection for big data
    Tan, Mingkui
    Tsang, Ivor W.
    Wang, Li
    Journal of Machine Learning Research, 2014, 15 : 1371 - 1429
  • [30] Feature Selection Using Genetic Algorithm for Big Data
    Saidi, Rania
    Ncir, Waad Bouaguel
    Essoussi, Nadia
    INTERNATIONAL CONFERENCE ON ADVANCED MACHINE LEARNING TECHNOLOGIES AND APPLICATIONS (AMLTA2018), 2018, 723 : 352 - 361