Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps

被引:27
|
作者
Xu, Rui [1 ]
Damelin, Steven [2 ]
Nadler, Boaz [3 ]
Wunsch, Donald C., II [1 ]
机构
[1] Missouri Univ Sci & Technol, Dept Elect & Comp Engn, Appl Computat Intelligence Lab, Rolla, MO 65409 USA
[2] Georgia So Univ, Dept Math Sci, Statesboro, GA 30460 USA
[3] Weizmann Inst Sci, Dept Appl Math & Comp Sci, IL-76100 Rehovot, Israel
基金
美国国家科学基金会;
关键词
Clustering; Diffusion maps; Feature filtering; Fuzzy ART; Gene expression profiles; MULTICLASS CANCER CLASSIFICATION; MOLECULAR CLASSIFICATION; FEATURE-SELECTION; PREDICTION; PATTERNS;
D O I
10.1016/j.artmed.2009.06.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Objective: The importance of gene expression data in cancer diagnosis and treatment has become widely known by cancer researchers in recent years. However, one of the major challenges in the computational analysis of such data is the curse of dimensionality because of the overwhelming number of variables measured (genes) versus the small number of samples. Here, we use a two-step method to reduce the dimension of gene expression data and aim to address the problem of high dimensionality. Methods: First, we extract a subset of genes based on statistical characteristics of their corresponding gene expression levels. Then, for further dimensionality reduction, we apply diffusion maps, which interpret the eigenfunctions of Markov matrices as a system of coordinates on the original data set, in order to obtain efficient representation of data geometric descriptions. Finally, a neural network clustering theory, fuzzy ART, is applied to the resulting data to generate clusters of cancer samples. Results: Experimental results on the small round blue-cell tumor data set, compared with other widely used clustering algorithms, such as the hierarchical clustering algorithm and K-means, show that our proposed method can effectively identify different cancer types and generate high-quality cancer sample clusters. Conclusion: The proposed feature selection methods and diffusion maps can achieve useful information from the multidimensional gene expression data and prove effective at addressing the problem of high dimensionality inherent in gene expression data analysis. (C) 2009 Elsevier BM. All rights reserved.
引用
收藏
页码:91 / 98
页数:8
相关论文
共 50 条
  • [21] A feature group weighting method for subspace clustering of high-dimensional data
    Chen, Xiaojun
    Ye, Yunming
    Xu, Xiaofei
    Huang, Joshua Zhexue
    PATTERN RECOGNITION, 2012, 45 (01) : 434 - 446
  • [22] A GA-based Feature Selection for High-dimensional Data Clustering
    Sun, Mei
    Xiong, Langhuan
    Sun, Haojun
    Jiang, Dazhi
    THIRD INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTING, 2009, : 769 - 772
  • [23] Using Feature Clustering for GP-Based Feature Construction on High-Dimensional Data
    Binh Tran
    Xue, Bing
    Zhang, Mengjie
    GENETIC PROGRAMMING, EUROGP 2017, 2017, 10196 : 210 - 226
  • [24] Clustering of High-Dimensional and Correlated Data
    McLachlan, Geoffrey J.
    Ng, Shu-Kay
    Wang, K.
    DATA ANALYSIS AND CLASSIFICATION, 2010, : 3 - 11
  • [25] Clustering in high-dimensional data spaces
    Murtagh, FD
    STATISTICAL CHALLENGES IN ASTRONOMY, 2003, : 279 - 292
  • [26] Compressive Clustering of High-dimensional Data
    Ruta, Andrzej
    Porikli, Fatih
    2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1, 2012, : 380 - 385
  • [27] A Feature Grouping Method for Ensemble Clustering of High-Dimensional Genomic Big Data
    Farid, Dewan Md.
    Nowe, Ann
    Manderick, Bernard
    PROCEEDINGS OF 2016 FUTURE TECHNOLOGIES CONFERENCE (FTC), 2016, : 260 - 268
  • [28] A density-based clustering algorithm for high-dimensional data with feature selection
    Qi Xianting
    Wang Pan
    2016 2ND INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS - COMPUTING TECHNOLOGY, INTELLIGENT TECHNOLOGY, INDUSTRIAL INFORMATION INTEGRATION (ICIICII), 2016, : 114 - 118
  • [29] Automated Clustering of High-dimensional Data with a Feature Weighted Mean Shift Algorithm
    Chakraborty, Saptarshi
    Paul, Debolina
    Das, Swagatam
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 6930 - 6938
  • [30] Differential Privacy High-Dimensional Data Publishing Based on Feature Selection and Clustering
    Chu, Zhiguang
    He, Jingsha
    Zhang, Xiaolei
    Zhang, Xing
    Zhu, Nafei
    ELECTRONICS, 2023, 12 (09)