Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps

被引:27
|
作者
Xu, Rui [1 ]
Damelin, Steven [2 ]
Nadler, Boaz [3 ]
Wunsch, Donald C., II [1 ]
机构
[1] Missouri Univ Sci & Technol, Dept Elect & Comp Engn, Appl Computat Intelligence Lab, Rolla, MO 65409 USA
[2] Georgia So Univ, Dept Math Sci, Statesboro, GA 30460 USA
[3] Weizmann Inst Sci, Dept Appl Math & Comp Sci, IL-76100 Rehovot, Israel
基金
美国国家科学基金会;
关键词
Clustering; Diffusion maps; Feature filtering; Fuzzy ART; Gene expression profiles; MULTICLASS CANCER CLASSIFICATION; MOLECULAR CLASSIFICATION; FEATURE-SELECTION; PREDICTION; PATTERNS;
D O I
10.1016/j.artmed.2009.06.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Objective: The importance of gene expression data in cancer diagnosis and treatment has become widely known by cancer researchers in recent years. However, one of the major challenges in the computational analysis of such data is the curse of dimensionality because of the overwhelming number of variables measured (genes) versus the small number of samples. Here, we use a two-step method to reduce the dimension of gene expression data and aim to address the problem of high dimensionality. Methods: First, we extract a subset of genes based on statistical characteristics of their corresponding gene expression levels. Then, for further dimensionality reduction, we apply diffusion maps, which interpret the eigenfunctions of Markov matrices as a system of coordinates on the original data set, in order to obtain efficient representation of data geometric descriptions. Finally, a neural network clustering theory, fuzzy ART, is applied to the resulting data to generate clusters of cancer samples. Results: Experimental results on the small round blue-cell tumor data set, compared with other widely used clustering algorithms, such as the hierarchical clustering algorithm and K-means, show that our proposed method can effectively identify different cancer types and generate high-quality cancer sample clusters. Conclusion: The proposed feature selection methods and diffusion maps can achieve useful information from the multidimensional gene expression data and prove effective at addressing the problem of high dimensionality inherent in gene expression data analysis. (C) 2009 Elsevier BM. All rights reserved.
引用
收藏
页码:91 / 98
页数:8
相关论文
共 50 条
  • [11] A Clustering Algorithm for High-Dimensional Nonlinear Feature Data with Applications
    Jiang H.
    Wang G.
    Gao J.
    Gao Z.
    Gao R.
    Guo Q.
    Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2017, 51 (12): : 49 - 55and90
  • [12] Latent Feature Group Learning for High-Dimensional Data Clustering
    Wang, Wenting
    He, Yulin
    Ma, Liheng
    Huang, Joshua Zhexue
    INFORMATION, 2019, 10 (06)
  • [13] On the scalability of feature selection methods on high-dimensional data
    V. Bolón-Canedo
    D. Rego-Fernández
    D. Peteiro-Barral
    A. Alonso-Betanzos
    B. Guijarro-Berdiñas
    N. Sánchez-Maroño
    Knowledge and Information Systems, 2018, 56 : 395 - 442
  • [14] On the scalability of feature selection methods on high-dimensional data
    Bolon-Canedo, V.
    Rego-Fernandez, D.
    Peteiro-Barral, D.
    Alonso-Betanzos, A.
    Guijarro-Berdinas, B.
    Sanchez-Marono, N.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 56 (02) : 395 - 442
  • [15] Clustering massive high dimensional data with dynamic feature maps
    Amarasiri, Rasika
    Alahakoon, Damminda
    Smith-Miles, Kate
    NEURAL INFORMATION PROCESSING, PT 2, PROCEEDINGS, 2006, 4233 : 814 - 823
  • [16] FEATURE CLUSTERING FOR PSO-BASED FEATURE CONSTRUCTION ON HIGH-DIMENSIONAL DATA
    Swesi, Idheba Mohamad Ali Omer
    Abu Bakar, Azuraliza
    JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA, 2019, 18 (04): : 439 - 472
  • [17] High-dimensional data clustering
    Bouveyron, C.
    Girard, S.
    Schmid, C.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) : 502 - 519
  • [18] Clustering High-Dimensional Data
    Masulli, Francesco
    Rovetta, Stefano
    CLUSTERING HIGH-DIMENSIONAL DATA, CHDD 2012, 2015, 7627 : 1 - 13
  • [19] FAST HIGH-DIMENSIONAL FILTERING USING CLUSTERING
    Nair, Pravin
    Chaudhury, Kunal N.
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 240 - 244
  • [20] Improved aquila optimizer with mRMR for feature selection of high-dimensional gene expression data
    Qin, Xiwen
    Zhang, Siqi
    Dong, Xiaogang
    Shi, Hongyu
    Yuan, Liping
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (09): : 13005 - 13027