Effective Data Dimensionality Reduction Workflow for High-Dimensional Gene Expression Datasets

被引:0
|
作者
Das, Utsha [1 ]
Srizon, Azmain Yakin [1 ]
Hasan, Md Al Mehedi [1 ]
Rahman, Julia [1 ]
Ben Islam, Md Khaled [2 ]
机构
[1] Rajshahi Univ Engn & Technol, Dept Comp Sci & Engn, Rajshahi, Bangladesh
[2] Pabna Univ Sci & Technol, Dept Comp Sci & Engn, Pabna, Bangladesh
关键词
T-Test; Principal Component Analysis; Recursive Feature Elimination; Random Forest; Support Vector Machine; CLASSIFICATION; PREDICTION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
While moving towards the era of 'Big Data', the scourge of dimensionality is growing an example of the most concerned obstacles in bioinformatics and biomedical research. Typically, an omics classification involves irrelevant and unnecessary features that can take a long time to compute and reduce classification performance. Previously, various researches showed that combining univariate and multivariate feature selection methods may enhance the enforcement of classification. In this research, we have proposed a workflow that can provide better classification performance by using fewer variables for gene expression data. To establish our statement, we started by taking four gene expression datasets: GSE5325, GSE6919/GPL8300, GSE6919/GPL92, and GSE6919/GPL93. We applied Student's t-test to discard redundant features. After that, Principal Component Analysis (PCA) was exercised to reduce the dimension of data. Wrapper Recursive Feature Elimination (RFE) method was performed over the reduced data to obtain the best combination of PCAs for better performance. Finally, the Support Vector Machine (SVM) was utilized to measure performance, and outcomes were compared with the previous researches. The results showed that our proposed approach produced a better performance with much fewer variables for gene expression data. All our research resources, documents, programs and snippets are located at https://github.com/Srizon143005/DataReductionWorkflow.
引用
收藏
页码:182 / 185
页数:4
相关论文
共 50 条
  • [1] Dimensionality reduction for visualizing high-dimensional biological data
    Malepathirana, Tamasha
    Senanayake, Damith
    Vidanaarachchi, Rajith
    Gautam, Vini
    Halgamuge, Saman
    [J]. BIOSYSTEMS, 2022, 220
  • [2] Dimensionality Reduction for Registration of High-Dimensional Data Sets
    Xu, Min
    Chen, Hao
    Varshney, Pramod K.
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013, 22 (08) : 3041 - 3049
  • [3] Effective indexing and searching with dimensionality reduction in high-dimensional space
    Jeong, Seungdo
    Kim, Sang-Wook
    Choi, Byung-Uk
    [J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2016, 31 (04): : 291 - 302
  • [4] Efficient indexing of high-dimensional data through dimensionality reduction
    Goh, CH
    Lim, A
    Ooi, BC
    Tan, KL
    [J]. DATA & KNOWLEDGE ENGINEERING, 2000, 32 (02) : 115 - 130
  • [5] Dimensionality Reduction Algorithms on High Dimensional Datasets
    Syarif, Iwan
    [J]. EMITTER-INTERNATIONAL JOURNAL OF ENGINEERING TECHNOLOGY, 2014, 2 (02) : 28 - 38
  • [6] Hybrid Dimensionality Reduction Forest With Pruning for High-Dimensional Data Classification
    Chen, Weihong
    Xu, Yuhong
    Yu, Zhiwen
    Cao, Wenming
    Chen, C. L. Philip
    Han, Guoqiang
    [J]. IEEE ACCESS, 2020, 8 : 40138 - 40150
  • [7] A hybrid dimensionality reduction method for outlier detection in high-dimensional data
    Guanglei Meng
    Biao Wang
    Yanming Wu
    Mingzhe Zhou
    Tiankuo Meng
    [J]. International Journal of Machine Learning and Cybernetics, 2023, 14 : 3705 - 3718
  • [8] Dependence maps, a dimensionality reduction with dependence distance for high-dimensional data
    Lee, Kichun
    Gray, Alexander
    Kim, Heeyoung
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 26 (03) : 512 - 532
  • [9] A hybrid dimensionality reduction method for outlier detection in high-dimensional data
    Meng, Guanglei
    Wang, Biao
    Wu, Yanming
    Zhou, Mingzhe
    Meng, Tiankuo
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (11) : 3705 - 3718
  • [10] Hierarchical Clustering of High-Dimensional Data Without Global Dimensionality Reduction
    Kampman, Ilari
    Elomaa, Tapio
    [J]. FOUNDATIONS OF INTELLIGENT SYSTEMS (ISMIS 2018), 2018, 11177 : 236 - 246