A biobjective feature selection algorithm for large omics datasets

被引:1
|
作者
Cavique, Luis [1 ,2 ]
Mendes, Armando B. [3 ,4 ]
Martiniano, Hugo F. M. C. [1 ,5 ]
Correia, Luis [1 ]
机构
[1] FCUL, MAS BioISI, Lisbon, Portugal
[2] Univ Aberta, Lisbon, Portugal
[3] Univ Acores, Ponta Delgada, Portugal
[4] Univ Minho, Algoritmi, Braga, Portugal
[5] Inst Dr Ricardo Jorge, Lisbon, Portugal
关键词
biobjective optimization; feature selection; heuristic decomposition; logical analysis of data;
D O I
10.1111/exsy.12301
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is one of the most important concepts in data mining when dimensionality reduction is needed. The performance measures of feature selection encompass predictive accuracy and result comprehensibility. Consistency-based methods are a significant category of feature selection research that substantially improves the comprehensibility of the result using the parsimony principle. In this work, the biobjective version of the algorithm logical analysis of inconsistent data is applied to large volumes of data. In order to deal with hundreds of thousands of attributes, heuristic decomposition uses parallel processing to solve a set covering problem and a cross-validation technique. The biobjective solutions contain the number of reduced features and the accuracy. The algorithm is applied to omics datasets with genome-like characteristics of patients with rare diseases.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Implicit feature selection for omics data phenotype discrimination
    Han, Xiaoxu
    [J]. APPLIED SOFT COMPUTING, 2014, 20 : 70 - 82
  • [42] GenoCore: A simple and fast algorithm for core subset selection from large genotype datasets
    Jeong, Seongmun
    Kim, Jae-Yoon
    Jeong, Soon-Chun
    Kang, Sung-Taeg
    Moon, Jung-Kyung
    Kim, Namshin
    [J]. PLOS ONE, 2017, 12 (07):
  • [43] Ensemble feature selection in medical datasets: Combining filter, wrapper, and embedded feature selection results
    Chen, Chih-Wen
    Tsai, Yi-Hong
    Chang, Fang-Rong
    Lin, Wei-Chao
    [J]. EXPERT SYSTEMS, 2020, 37 (05)
  • [44] Large-Scale Automatic Feature Selection for Biomarker Discovery in High-Dimensional OMICs Data
    Leclercq, Mickael
    Vittrant, Benjamin
    Martin-Magniette, Marie Laure
    Boyer, Marie Pier Scott
    Perin, Olivier
    Bergeron, Alain
    Fradet, Yves
    Droit, Arnaud
    [J]. FRONTIERS IN GENETICS, 2019, 10
  • [45] A new clustering algorithm for large datasets
    Li Qing-feng
    Peng Wen-feng
    [J]. JOURNAL OF CENTRAL SOUTH UNIVERSITY OF TECHNOLOGY, 2011, 18 (03): : 823 - 829
  • [46] Biobjective gradient descent for feature selection on high dimension, low sample size data
    Issa, Tina
    Angel, Eric
    Zehraoui, Farida
    [J]. PLOS ONE, 2024, 19 (07):
  • [47] Coevolutive clustering algorithm for large datasets
    Fabris, Fabio
    Luchi, Diego
    Varejao, Flavio Miguel
    [J]. 2020 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2020,
  • [48] A new clustering algorithm for large datasets
    Qing-feng Li
    Wen-feng Peng
    [J]. Journal of Central South University, 2011, 18 : 823 - 829
  • [49] A new clustering algorithm for large datasets
    李清峰
    彭文峰
    [J]. Journal of Central South University, 2011, 18 (03) : 823 - 829
  • [50] A Comparative Study of Feature Selection Methods on Genomic Datasets
    Anaraki, Javad Rahimipour
    Usefi, Hamid
    [J]. 2019 IEEE 32ND INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS), 2019, : 471 - 476