Feature selection techniques in the context of big data: taxonomy and analysis

被引:0
|
作者
Hudhaifa Mohammed Abdulwahab
S. Ajitha
Mufeed Ahmed Naji Saif
机构
[1] Ramaiah Institute of Technology (Affiliated to VTU University),Department of Computer Application
[2] Sri Jayachamarajendra College of Engineering (Affiliated to VTU University),Department of Computer Applications
来源
Applied Intelligence | 2022年 / 52卷
关键词
Big Data; Dimensionality Reduction; Feature Selection; Streaming Feature;
D O I
暂无
中图分类号
学科分类号
摘要
Recent advancements in Information Technology (IT) have engendered the rapid production of big data, as enormous volumes of data with high dimensional features grow exponentially in different fields. Therefore, dealing with high-dimensional data creates new challenges in terms of data processing efficiency and effectiveness. To address such challenges, Feature Selection (FS) is among the most utilized dimensionality reduction methods, which is helpful in reducing the high dimensionality of large-scale data by picking up a small subset of related and significant features and eliminating unrelated and redundant features in order to construct effective prediction models. This article provides a comprehensive review of the latest FS approaches in the context of big data along with a structured taxonomy, which categorizes the existing methods based on their nature, search strategy, evaluation process, and feature structure. Moreover, it presents a qualitative analysis of FS methods based on their objective, structure, search strategy, schema, learning task, strengths, and weaknesses. Further, a quantitative analysis is also performed to illustrate the number of publications related to FS based on the timeline, main category, and other sub-categories. An experimental study is also conducted comparing ten methods from different categories using twelve benchmark datasets from the University of California, Irvine (UCI) Machine Learning Repository and Arizona State University (ASU) Feature Selection Repository to evaluate their performance in terms of (accuracy, precision, recall, F-measures, and the number of selected features). Finally, we highlight the research issues and open challenges related to FS to assist researchers in identifying future research directions.
引用
收藏
页码:13568 / 13613
页数:45
相关论文
共 50 条
  • [1] Feature selection techniques in the context of big data: taxonomy and analysis
    Abdulwahab, Hudhaifa Mohammed
    Ajitha, S.
    Saif, Mufeed Ahmed Naji
    [J]. APPLIED INTELLIGENCE, 2022, 52 (12) : 13568 - 13613
  • [2] Big Data Retrieval: Taxonomy, Techniques and Feature Analysis
    Haneef, Israr
    Munir, Ehsan Ullah
    Qaiser, Ghazia
    Umar, Hafiz Gulfam Ahmad
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2018, 18 (11): : 55 - 59
  • [3] Feature Selection Techniques for Big Data Analytics
    Albattah, Waleed
    Khan, Rehan Ullah
    Alsharekh, Mohammed F.
    Khasawneh, Samer F.
    [J]. ELECTRONICS, 2022, 11 (19)
  • [4] Feature Selection in Big Data using Filter Based Techniques
    Srinivas, Sumitra K.
    Kancharla, Gangadhara Rao
    [J]. 2019 4TH MEC INTERNATIONAL CONFERENCE ON BIG DATA AND SMART CITY (ICBDSC), 2019, : 139 - 145
  • [5] Recent advances and emerging challenges of feature selection in the context of big data
    Bolon-Canedo, V.
    Sanchez-Marono, N.
    Alonso-Betanzos, A.
    [J]. KNOWLEDGE-BASED SYSTEMS, 2015, 86 : 33 - 45
  • [6] Feature Selection Techniques for Bioinformatics Data Analysis
    Theng, Dipti
    Bhoyar, K. K.
    [J]. 2022 INTERNATIONAL CONFERENCE ON GREEN ENERGY, COMPUTING AND SUSTAINABLE TECHNOLOGY (GECOST), 2022, : 46 - 50
  • [7] A Meta-Review of Feature Selection Techniques in the Context of Microarray Data
    Mungloo-Dilmohamud, Zahra
    Jaufeerally-Fakim, Yasmina
    Pena-Reyes, Carlos
    [J]. BIOINFORMATICS AND BIOMEDICAL ENGINEERING, IWBBIO 2017, PT I, 2017, 10208 : 33 - 49
  • [8] Distributed Feature Selection for Efficient Economic Big Data Analysis
    Zhao, Liang
    Chen, Zhikui
    Hu, Yueming
    Min, Geyong
    Jiang, Zhaohua
    [J]. IEEE TRANSACTIONS ON BIG DATA, 2018, 4 (02) : 164 - 176
  • [9] A STUDY ON FEATURE SELECTION IN BIG DATA
    Manikandan, R. P. S.
    Kalpana, A. M.
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2017,
  • [10] Data Source Selection in Big Data Context
    Safhi, Hicham Moad
    Frikh, Bouchra
    Ouhbi, Brahim
    [J]. IIWAS2019: THE 21ST INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES, 2019, : 611 - 616