Effective Data Dimensionality Reduction Workflow for High-Dimensional Gene Expression Datasets

被引:0
|
作者
Das, Utsha [1 ]
Srizon, Azmain Yakin [1 ]
Hasan, Md Al Mehedi [1 ]
Rahman, Julia [1 ]
Ben Islam, Md Khaled [2 ]
机构
[1] Rajshahi Univ Engn & Technol, Dept Comp Sci & Engn, Rajshahi, Bangladesh
[2] Pabna Univ Sci & Technol, Dept Comp Sci & Engn, Pabna, Bangladesh
关键词
T-Test; Principal Component Analysis; Recursive Feature Elimination; Random Forest; Support Vector Machine; CLASSIFICATION; PREDICTION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
While moving towards the era of 'Big Data', the scourge of dimensionality is growing an example of the most concerned obstacles in bioinformatics and biomedical research. Typically, an omics classification involves irrelevant and unnecessary features that can take a long time to compute and reduce classification performance. Previously, various researches showed that combining univariate and multivariate feature selection methods may enhance the enforcement of classification. In this research, we have proposed a workflow that can provide better classification performance by using fewer variables for gene expression data. To establish our statement, we started by taking four gene expression datasets: GSE5325, GSE6919/GPL8300, GSE6919/GPL92, and GSE6919/GPL93. We applied Student's t-test to discard redundant features. After that, Principal Component Analysis (PCA) was exercised to reduce the dimension of data. Wrapper Recursive Feature Elimination (RFE) method was performed over the reduced data to obtain the best combination of PCAs for better performance. Finally, the Support Vector Machine (SVM) was utilized to measure performance, and outcomes were compared with the previous researches. The results showed that our proposed approach produced a better performance with much fewer variables for gene expression data. All our research resources, documents, programs and snippets are located at https://github.com/Srizon143005/DataReductionWorkflow.
引用
收藏
页码:182 / 185
页数:4
相关论文
共 50 条
  • [31] An adaptive and efficient dimensionality reduction algorithm for high-dimensional indexing
    Jin, H
    Ooi, BC
    Shen, HT
    Yu, C
    Zhou, AY
    [J]. 19TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2003, : 87 - 98
  • [32] Dimensionality reduction in high-dimensional space for multimedia information retrieval
    Jeong, Seungdo
    Kim, Sang-Wook
    Choi, Byung-Uk
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2007, 4653 : 404 - +
  • [33] Incorporating Texture Information into Dimensionality Reduction for High-Dimensional Images
    Vieth, A.
    Vilanova, A.
    Lelieveldt, B.
    Eisemann, E.
    Hollt, T.
    [J]. 2022 IEEE 15TH PACIFIC VISUALIZATION SYMPOSIUM (PACIFICVIS 2022), 2022, : 11 - 20
  • [34] A dimensionality reduction method for efficient search of high-dimensional databases
    Aghbari, Z
    Kaneko, K
    Makinouchi, A
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2003, E86D (06): : 1032 - 1041
  • [35] An adaptive and dynamic dimensionality reduction method for high-dimensional indexing
    Shen, Heng Tao
    Zhou, Xiaofang
    Zhou, Aoying
    [J]. VLDB JOURNAL, 2007, 16 (02): : 219 - 234
  • [36] High-Dimensional Brain in a High-Dimensional World: Blessing of Dimensionality
    Gorban, Alexander N.
    Makarov, Valery A.
    Tyukin, Ivan Y.
    [J]. ENTROPY, 2020, 22 (01) : 82
  • [37] On dimensionality reduction of high dimensional data sets
    Chizi, B
    Shmilovici, A
    Maimon, O
    [J]. INTELLIGENT TECHNOLOGIES - THEORY AND APPLICATIONS: NEW TRENDS IN INTELLIGENT TECHNOLOGIES, 2002, 76 : 233 - 238
  • [38] Dimensionality Reduction and Subspace Clustering in Mixed Reality for Condition Monitoring of High-Dimensional Production Data
    Hoppenstedt, Burkhard
    Reichert, Manfred
    Kammerer, Klaus
    Probst, Thomas
    Schlee, Winfried
    Spiliopoulou, Myra
    Pryss, Ruediger
    [J]. SENSORS, 2019, 19 (18)
  • [39] Proposing a Dimensionality Reduction Technique With an Inequality for Unsupervised Learning from High-Dimensional Big Data
    Ismkhan, Hassan
    Izadi, Mohammad
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2023, 53 (06): : 3880 - 3889
  • [40] An Efficient Dimensionality Reduction Approach for Small-sample Size and High-dimensional Data Modeling
    Qiu, Xintao
    Fu, Dongmei
    Fu, Zhenduo
    [J]. JOURNAL OF COMPUTERS, 2014, 9 (03) : 576 - 580