A review on advancements in feature selection and feature extraction for high-dimensional NGS data analysis

被引：0

作者：

Borah, Kasmika ^{[1
]}

Das, Himanish Shekhar ^{[1
]}

Seth, Soumita ^{[2
]}

Mallick, Koushik ^{[3
]}

Rahaman, Zubair ^{[4
]}

Mallik, Saurav ^{[5
,6
]}

机构：

[1] Cotton Univ, Dept Comp Sci & Informat Technol, Gauhati 781001, Assam, India

[2] Future Inst Engn & Management, Dept Comp Sci & Engn, Kolkata 700150, West Bengal, India

[3] RCC Inst Informat Technol, Dept Comp Sci & Engn, Canal S Rd, Kolkata 700015, West Bengal, India

[4] Vitas Healthcare, Kissimmee, FL USA

[5] Harvard T H Chan Sch Publ Hlth, Dept Environm Hlth, Boston, MA 02115 USA

[6] Univ Arizona, Dept Pharmacol & Toxicol, Tucson, AZ 85721 USA

来源：

FUNCTIONAL & INTEGRATIVE GENOMICS | 2024年 / 24卷 / 05期

关键词：

Feature Selection; Feature Extraction; Dimensionality Reduction; Next Generation Sequencing data; GENE-EXPRESSION DATA; UNSUPERVISED FEATURE-SELECTION; RNA SEQUENCING DATA; MUTUAL INFORMATION; MARKER SELECTION; SEQ DATA; CLASSIFICATION; ALGORITHM; RELEVANCE; FRAMEWORK;

D O I：

10.1007/s10142-024-01415-x

中图分类号：

Q3 [遗传学];

学科分类号：

071007 ; 090102 ;

摘要：

Recent advancements in biomedical technologies and the proliferation of high-dimensional Next Generation Sequencing (NGS) datasets have led to significant growth in the bulk and density of data. The NGS high-dimensional data, characterized by a large number of genomics, transcriptomics, proteomics, and metagenomics features relative to the number of biological samples, presents significant challenges for reducing feature dimensionality. The high dimensionality of NGS data poses significant challenges for data analysis, including increased computational burden, potential overfitting, and difficulty in interpreting results. Feature selection and feature extraction are two pivotal techniques employed to address these challenges by reducing the dimensionality of the data, thereby enhancing model performance, interpretability, and computational efficiency. Feature selection and feature extraction can be categorized into statistical and machine learning methods. The present study conducts a comprehensive and comparative review of various statistical, machine learning, and deep learning-based feature selection and extraction techniques specifically tailored for NGS and microarray data interpretation of humankind. A thorough literature search was performed to gather information on these techniques, focusing on array-based and NGS data analysis. Various techniques, including deep learning architectures, machine learning algorithms, and statistical methods, have been explored for microarray, bulk RNA-Seq, and single-cell, single-cell RNA-Seq (scRNA-Seq) technology-based datasets surveyed here. The study provides an overview of these techniques, highlighting their applications, advantages, and limitations in the context of high-dimensional NGS data. This review provides better insights for readers to apply feature selection and feature extraction techniques to enhance the performance of predictive models, uncover underlying biological patterns, and gain deeper insights into massive and complex NGS and microarray data.

引用

页数：31

共 50 条

[1] FEATURE SELECTION FOR HIGH-DIMENSIONAL DATA ANALYSIS
Verleysen, Michel
[J]. NCTA 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON NEURAL COMPUTATION THEORY AND APPLICATIONS, 2011, : IS23 - IS25
[2] FEATURE SELECTION FOR HIGH-DIMENSIONAL DATA ANALYSIS
Verleysen, Michel
[J]. ECTA 2011/FCTA 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON EVOLUTIONARY COMPUTATION THEORY AND APPLICATIONS AND INTERNATIONAL CONFERENCE ON FUZZY COMPUTATION THEORY AND APPLICATIONS, 2011,
[3] Feature selection for high-dimensional data
Bolón-Canedo V.
Sánchez-Maroño N.
Alonso-Betanzos A.
[J]. Progress in Artificial Intelligence, 2016, 5 (2) : 65 - 75
[4] Feature selection for high-dimensional data
Destrero A.
Mosci S.
De Mol C.
Verri A.
Odone F.
[J]. Computational Management Science, 2009, 6 (1) : 25 - 40
[5] Survival analysis for high-dimensional, heterogeneous medical data: Exploring feature extraction as an alternative to feature selection
Poelsterl, Sebastian
Conjeti, Sailesh
Navab, Nassir
Katouzian, Amin
[J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2016, 72 : 1 - 11
[6] Feature extraction and uncorrelated discriminant analysis for high-dimensional data
Yang, Wen-Hui
Dai, Dao-Qing
Yan, Hong
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (05) : 601 - 614
[7] Feature selection for high-dimensional imbalanced data
Yin, Liuzhi
Ge, Yong
Xiao, Keli
Wang, Xuehua
Quan, Xiaojun
[J]. NEUROCOMPUTING, 2013, 105 : 3 - 11
[8] Feature selection for high-dimensional data in astronomy
Zheng, Hongwen
Zhang, Yanxia
[J]. ADVANCES IN SPACE RESEARCH, 2008, 41 (12) : 1960 - 1964
[9] A filter feature selection for high-dimensional data
Janane, Fatima Zahra
Ouaderhman, Tayeb
Chamlal, Hasna
[J]. JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2023, 17
[10] Feature selection for high-dimensional temporal data
Tsagris, Michail
Lagani, Vincenzo
Tsamardinos, Ioannis
[J]. BMC BIOINFORMATICS, 2018, 19

← 1 2 3 4 5 →