Fast and Scalable Feature Selection for Gene Expression Data Using Hilbert-Schmidt Independence Criterion

被引:37
|
作者
Gangeh, Mehrdad J. [1 ,2 ,3 ]
Zarkoob, Hadi [4 ]
Ghodsi, Ali [5 ]
机构
[1] Univ Toronto, Dept Med Biophys & Radiat Oncol, Toronto, ON M5G 2M9, Canada
[2] Sunnybrook Hlth Sci Ctr, Dept Radiat Oncol, Toronto, ON M4N 3M5, Canada
[3] Sunnybrook Hlth Sci Ctr, Imaging Res Phys Sci, Toronto, ON M4N 3M5, Canada
[4] BaseHealth Inc, Sunnyvale, CA 94086 USA
[5] Univ Waterloo, Dept Stat & Actuarial Sci, Waterloo, ON N2L 3G1, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Big data; feature selection; gene expression; Hilbert-Schmidt independence criterion; kernel methods; scalability; BIOMARKER SELECTION; CLASSIFICATION; ALGORITHMS; CARCINOMAS; STABILITY;
D O I
10.1109/TCBB.2016.2631164
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Goal: In computational biology, selecting a small subset of informative genes from microarray data continues to be a challenge due to the presence of thousands of genes. This paper aims at quantifying the dependence between gene expression data and the response variables and to identifying a subset of the most informative genes using a fast and scalable multivariate algorithm. Methods: A novel algorithm for feature selection from gene expression data was developed. The algorithm was based on the Hilbert-Schmidt independence criterion (HSIC), and was partly motivated by singular value decomposition (SVD). Results: The algorithm is computationally fast and scalable to large datasets. Moreover, it can be applied to problems with any type of response variables including, biclass, multiclass, and continuous response variables. The performance of the proposed algorithm in terms of accuracy, stability of the selected genes, speed, and scalability was evaluated using both synthetic and real-world datasets. The simulation results demonstrated that the proposed algorithm effectively and efficiently extracted stable genes with high predictive capability, in particular for datasets with multiclass response variables. Conclusion/Significance: The proposed method does not require the whole microarray dataset to be stored in memory, and thus can easily be scaled to large datasets. This capability is an important attribute in big data analytics, where data can be large and massively distributed.
引用
收藏
页码:167 / 181
页数:15
相关论文
共 50 条
  • [1] UNSUPERVISED FEATURE SELECTION WITH HILBERT-SCHMIDT INDEPENDENCE CRITERION LASSO
    Wang, Tinghua
    Hu, Zhenwei
    Zhou, Huiying
    [J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2023, 19 (03): : 927 - 939
  • [2] A unified view of feature selection based on Hilbert-Schmidt independence criterion
    Wang, Tinghua
    Hu, Zhenwei
    Liu, Hanming
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2023, 236
  • [3] Filter-based unsupervised feature selection using Hilbert-Schmidt independence criterion
    Liaghat, Samaneh
    Mansoori, Eghbal G.
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (09) : 2313 - 2328
  • [4] Few-shot Learning for Feature Selection with Hilbert-Schmidt Independence Criterion
    Kumagai, Atsutoshi
    Iwata, Tomoharu
    Ida, Yasutoshi
    Fujiwara, Yasuhiro
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [5] On Kernel Parameter Selection in Hilbert-Schmidt Independence Criterion
    Sugiyama, Masashi
    Yamada, Makoto
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (10): : 2564 - 2567
  • [6] Microarray Design Using the Hilbert-Schmidt Independence Criterion
    Bedo, Justin
    [J]. PATTERN RECOGNITION IN BIOINFORMATICS, PROCEEDINGS, 2008, 5265 : 288 - 298
  • [7] Hilbert-Schmidt Independence Criterion Lasso Feature Selection in Parkinson's Disease Detection System
    Wiharto, Wiharto
    Sucipto, Ahmad
    Salamah, Umi
    [J]. INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS, 2023, 23 (04) : 482 - 499
  • [8] Sequence Alignment with the Hilbert-Schmidt Independence Criterion
    Campbell, Jordan
    Lewis, J. P.
    Seol, Yeongho
    [J]. PROCEEDINGS CVMP 2018: THE 15TH ACM SIGGRAPH EUROPEAN CONFERENCE ON VISUAL MEDIA PRODUCTION, 2018,
  • [9] Robust Learning with the Hilbert-Schmidt Independence Criterion
    Greenfeld, Daniel
    Shalit, Uri
    [J]. 25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [10] Sensitivity maps of the Hilbert-Schmidt independence criterion
    Perez-Suay, Adrian
    Camps-Valls, Gustau
    [J]. APPLIED SOFT COMPUTING, 2018, 70 : 1054 - 1063