Streaming feature selection algorithms for big data: A survey

被引:29
|
作者
AlNuaimi, Noura [1 ]
Masud, Mohammad Mehedy [1 ]
Serhani, Mohamed Adel [1 ]
Zaki, Nazar [1 ]
机构
[1] United Arab Emirates Univ, Coll Informat Technol, Al Ain, U Arab Emirates
关键词
Big data; Redundant features; Relevant features; Streaming feature grouping; Streaming feature selection; ONLINE FEATURE-SELECTION; MUTUAL INFORMATION; GRANULATION; RELEVANCE; ENTROPY;
D O I
10.1016/j.aci.2019.01.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Organizations in many domains generate a considerable amount of heterogeneous data every day. Such data can be processed to enhance these organizations' decisions in real time. However, storing and processing large and varied datasets (known as big data) is challenging to do in real time. In machine learning, streaming feature selection has always been considered a superior technique for selecting the relevant subset features from highly dimensional data and thus reducing learning complexity. In the relevant literature, streaming feature selection refers to the features that arrive consecutively over time; despite a lack of exact figure on the number of features, numbers of instances are well-established. Many scholars in the field have proposed streaming-feature-selection algorithms in attempts to find the proper solution to this problem. This paper presents an exhaustive and methodological introduction of these techniques. This study provides a review of the traditional feature-selection algorithms and then scrutinizes the current algorithms that use streaming feature selection to determine their strengths and weaknesses. The survey also sheds light on the ongoing challenges in big-data research.
引用
收藏
页码:113 / 135
页数:23
相关论文
共 50 条
  • [31] Scalable and Accurate Online Feature Selection for Big Data
    Yu, Kui
    Wu, Xindong
    Ding, Wei
    Pei, Jian
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2016, 11 (02)
  • [32] Distributed Evolutionary Feature Selection for Big Data Processing
    Bouaguel, Waad
    Ben NCir, Chiheb Eddine
    [J]. VIETNAM JOURNAL OF COMPUTER SCIENCE, 2022, 09 (03) : 313 - 332
  • [33] Reducing Data Complexity in Feature Extraction and Feature Selection for Big Data Security Analytics
    Sisiaridis, Dimitrios
    Markowitch, Olivier
    [J]. 2018 1ST INTERNATIONAL CONFERENCE ON DATA INTELLIGENCE AND SECURITY (ICDIS 2018), 2018, : 43 - 48
  • [34] Improved Feature Selection Model for Big Data Analytics
    El-Hasnony, Ibrahim M.
    Barakat, Sherif I.
    Elhoseny, Mohamed
    Mostafa, Reham R.
    [J]. IEEE ACCESS, 2020, 8 : 66989 - 67004
  • [35] Feature Selection for Big Visual Data: Overview and Challenges
    Bolon-Canedo, Veronica
    Remeseiro, Beatriz
    Cancela, Brais
    [J]. IMAGE ANALYSIS AND RECOGNITION (ICIAR 2018), 2018, 10882 : 136 - 143
  • [36] Data visualization and feature selection: New algorithms for nongaussian data
    Yang, HH
    Moody, J
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 687 - 693
  • [37] A survey on feature selection methods for mixed data
    Solorio-Fernandez, Saul
    Carrasco-Ochoa, J. Ariel
    Martinez-Trinidad, Jose Francisco
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (04) : 2821 - 2846
  • [38] A survey on feature selection methods for mixed data
    Saúl Solorio-Fernández
    J. Ariel Carrasco-Ochoa
    José Francisco Martínez-Trinidad
    [J]. Artificial Intelligence Review, 2022, 55 : 2821 - 2846
  • [39] A Survey of Bitmap Index Compression Algorithms for Big Data
    Chen, Zhen
    Wen, Yuhao
    Cao, Junwei
    Zheng, Wenxun
    Chang, Jiahui
    Wu, Yinjun
    Ma, Ge
    Hakmaoui, Mourad
    Peng, Guodong
    [J]. TSINGHUA SCIENCE AND TECHNOLOGY, 2015, 20 (01) : 100 - 115
  • [40] A Survey on Job Scheduling Algorithms in Big Data Processing
    Gautam, Jyoti V.
    Prajapati, Harshadkumar B.
    Dabhi, Vipul K.
    Chaudhary, Sanjay
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES, 2015,