A Novel Approach for Feature Selection Based on MapReduce for Biomarker Discovery

被引:0
|
作者
Kourid, Ahlem [1 ]
Batouche, Mohamed [1 ]
机构
[1] Constantine Univ 2, Coll NTIC, Dept Comp Sci, Constantine 25000, Algeria
关键词
Feature selection; Scale Machine Learning; Big Data; Bioinformatics; Biomarker Discovery; EXPRESSION; GENES;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scale feature selection is one of the most important fields in the big data domain that can solve real data problems, such as bioinformatics, when it is necessary to process huge amount of data. The efficiency of existing feature selection algorithms significantly downgrades, if not totally inapplicable, when data size exceeds hundreds of gigabytes, because most feature selection algorithms are designed for centralized computing architecture. For that distributed computing techniques, such as MapReduce can be applied to handle very large data. Our approach is to scale the existing method for feature selection, Kmeans clustering and Signal to Noise Ratio (SNR) combined with optimization technique as Binary Particle Swarm Optimization (BPSO). The proposed method is divided into two stages. In the first stage, we have used parallel Kmeans on MapReduce for clustering features, and then we have applied iterative MapReduce that implement parallel SNR ranking for each cluster, after we have selected the top ranked feature from each cluster. The top scored features from each cluster are gathered and a new feature subset is generated. In the second stage the new feature subset is used as input to the novel BPSO proposed based on MapReduce and optimized feature subset is being produced. The proposed method is implemented in a distributed environment, and its efficiency is illustrated through analyzing practical problems such as biomarker discovery.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Biomarker Discovery Based on Large-Scale Feature Selection and MapReduce
    Kourid, Ahlam
    Batouche, Mohamed
    [J]. COMPUTER SCIENCE AND ITS APPLICATIONS, CIIA 2015, 2015, 456 : 81 - 92
  • [2] Stable feature selection for biomarker discovery
    He, Zengyou
    Yu, Weichuan
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2010, 34 (04) : 215 - 225
  • [3] A novel class dependent feature selection method for cancer biomarker discovery
    Zhou, Wengang
    Dickerson, Julie A.
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2014, 47 : 66 - 75
  • [4] A Novel Configuration Tuning Method Based on Feature Selection for Hadoop MapReduce
    Liu, Jun
    Tang, Sule
    Xu, Guangxia
    Ma, Chuang
    Lin, Mingwei
    [J]. IEEE ACCESS, 2020, 8 : 63862 - 63871
  • [5] An Ensemble Feature Selection Method for Biomarker Discovery
    Shahrjooihaghighi, Aliasghar
    Frigui, Hichem
    Zhang, Xiang
    Wei, Xiaoli
    Shi, Biyun
    Trabelsi, Ameni
    [J]. 2017 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2017, : 416 - 421
  • [6] Bayesian Error Analysis for Feature Selection in Biomarker Discovery
    Pour, Ali Foroughi
    Dalton, Lori A.
    [J]. IEEE ACCESS, 2019, 7 : 127544 - 127563
  • [7] Ensemble Feature Selection for Biomarker Discovery in Mass Spectrometry-based Metabolomics
    ShahrjooiHaghighi, AliAsghar
    Frigui, Hichem
    Zhang, Xiang
    Wei, Xiaoli
    Shi, Biyun
    McClain, Craig J.
    [J]. SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 19 - 24
  • [8] A Comparative Study of Feature Selection Methods for Biomarker Discovery
    Mungloo-Dilmohamud, Zahra
    Marigliano, Gary
    Jaufeerally-Fakim, Yasmina
    Pena-Reyes, Carlos
    [J]. PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 2789 - 2791
  • [9] Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach
    Peralta, Daniel
    del Rio, Sara
    Ramirez-Gallego, Sergio
    Triguero, Isaac
    Benitez, Josem.
    Herrera, Francisco
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2015, 2015
  • [10] A novel approach toward optimal workflow selection for DNA methylation biomarker discovery
    Nazer, Naghme
    Sepehri, Mohammad Hossein
    Mohammadzade, Hoda
    Mehrmohamadi, Mahya
    [J]. BMC BIOINFORMATICS, 2024, 25 (01)