A review on advancements in feature selection and feature extraction for high-dimensional NGS data analysis

被引:0
|
作者
Borah, Kasmika [1 ]
Das, Himanish Shekhar [1 ]
Seth, Soumita [2 ]
Mallick, Koushik [3 ]
Rahaman, Zubair [4 ]
Mallik, Saurav [5 ,6 ]
机构
[1] Cotton Univ, Dept Comp Sci & Informat Technol, Gauhati 781001, Assam, India
[2] Future Inst Engn & Management, Dept Comp Sci & Engn, Kolkata 700150, West Bengal, India
[3] RCC Inst Informat Technol, Dept Comp Sci & Engn, Canal S Rd, Kolkata 700015, West Bengal, India
[4] Vitas Healthcare, Kissimmee, FL USA
[5] Harvard T H Chan Sch Publ Hlth, Dept Environm Hlth, Boston, MA 02115 USA
[6] Univ Arizona, Dept Pharmacol & Toxicol, Tucson, AZ 85721 USA
关键词
Feature Selection; Feature Extraction; Dimensionality Reduction; Next Generation Sequencing data; GENE-EXPRESSION DATA; UNSUPERVISED FEATURE-SELECTION; RNA SEQUENCING DATA; MUTUAL INFORMATION; MARKER SELECTION; SEQ DATA; CLASSIFICATION; ALGORITHM; RELEVANCE; FRAMEWORK;
D O I
10.1007/s10142-024-01415-x
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Recent advancements in biomedical technologies and the proliferation of high-dimensional Next Generation Sequencing (NGS) datasets have led to significant growth in the bulk and density of data. The NGS high-dimensional data, characterized by a large number of genomics, transcriptomics, proteomics, and metagenomics features relative to the number of biological samples, presents significant challenges for reducing feature dimensionality. The high dimensionality of NGS data poses significant challenges for data analysis, including increased computational burden, potential overfitting, and difficulty in interpreting results. Feature selection and feature extraction are two pivotal techniques employed to address these challenges by reducing the dimensionality of the data, thereby enhancing model performance, interpretability, and computational efficiency. Feature selection and feature extraction can be categorized into statistical and machine learning methods. The present study conducts a comprehensive and comparative review of various statistical, machine learning, and deep learning-based feature selection and extraction techniques specifically tailored for NGS and microarray data interpretation of humankind. A thorough literature search was performed to gather information on these techniques, focusing on array-based and NGS data analysis. Various techniques, including deep learning architectures, machine learning algorithms, and statistical methods, have been explored for microarray, bulk RNA-Seq, and single-cell, single-cell RNA-Seq (scRNA-Seq) technology-based datasets surveyed here. The study provides an overview of these techniques, highlighting their applications, advantages, and limitations in the context of high-dimensional NGS data. This review provides better insights for readers to apply feature selection and feature extraction techniques to enhance the performance of predictive models, uncover underlying biological patterns, and gain deeper insights into massive and complex NGS and microarray data.
引用
收藏
页数:31
相关论文
共 50 条
  • [1] FEATURE SELECTION FOR HIGH-DIMENSIONAL DATA ANALYSIS
    Verleysen, Michel
    [J]. NCTA 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON NEURAL COMPUTATION THEORY AND APPLICATIONS, 2011, : IS23 - IS25
  • [2] FEATURE SELECTION FOR HIGH-DIMENSIONAL DATA ANALYSIS
    Verleysen, Michel
    [J]. ECTA 2011/FCTA 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON EVOLUTIONARY COMPUTATION THEORY AND APPLICATIONS AND INTERNATIONAL CONFERENCE ON FUZZY COMPUTATION THEORY AND APPLICATIONS, 2011,
  • [3] Feature selection for high-dimensional data
    Bolón-Canedo V.
    Sánchez-Maroño N.
    Alonso-Betanzos A.
    [J]. Progress in Artificial Intelligence, 2016, 5 (2) : 65 - 75
  • [4] Feature selection for high-dimensional data
    Destrero A.
    Mosci S.
    De Mol C.
    Verri A.
    Odone F.
    [J]. Computational Management Science, 2009, 6 (1) : 25 - 40
  • [5] Survival analysis for high-dimensional, heterogeneous medical data: Exploring feature extraction as an alternative to feature selection
    Poelsterl, Sebastian
    Conjeti, Sailesh
    Navab, Nassir
    Katouzian, Amin
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2016, 72 : 1 - 11
  • [6] Feature extraction and uncorrelated discriminant analysis for high-dimensional data
    Yang, Wen-Hui
    Dai, Dao-Qing
    Yan, Hong
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (05) : 601 - 614
  • [7] Feature selection for high-dimensional imbalanced data
    Yin, Liuzhi
    Ge, Yong
    Xiao, Keli
    Wang, Xuehua
    Quan, Xiaojun
    [J]. NEUROCOMPUTING, 2013, 105 : 3 - 11
  • [8] Feature selection for high-dimensional data in astronomy
    Zheng, Hongwen
    Zhang, Yanxia
    [J]. ADVANCES IN SPACE RESEARCH, 2008, 41 (12) : 1960 - 1964
  • [9] A filter feature selection for high-dimensional data
    Janane, Fatima Zahra
    Ouaderhman, Tayeb
    Chamlal, Hasna
    [J]. JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2023, 17
  • [10] Feature selection for high-dimensional temporal data
    Tsagris, Michail
    Lagani, Vincenzo
    Tsamardinos, Ioannis
    [J]. BMC BIOINFORMATICS, 2018, 19