Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps

被引:27
|
作者
Xu, Rui [1 ]
Damelin, Steven [2 ]
Nadler, Boaz [3 ]
Wunsch, Donald C., II [1 ]
机构
[1] Missouri Univ Sci & Technol, Dept Elect & Comp Engn, Appl Computat Intelligence Lab, Rolla, MO 65409 USA
[2] Georgia So Univ, Dept Math Sci, Statesboro, GA 30460 USA
[3] Weizmann Inst Sci, Dept Appl Math & Comp Sci, IL-76100 Rehovot, Israel
基金
美国国家科学基金会;
关键词
Clustering; Diffusion maps; Feature filtering; Fuzzy ART; Gene expression profiles; MULTICLASS CANCER CLASSIFICATION; MOLECULAR CLASSIFICATION; FEATURE-SELECTION; PREDICTION; PATTERNS;
D O I
10.1016/j.artmed.2009.06.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Objective: The importance of gene expression data in cancer diagnosis and treatment has become widely known by cancer researchers in recent years. However, one of the major challenges in the computational analysis of such data is the curse of dimensionality because of the overwhelming number of variables measured (genes) versus the small number of samples. Here, we use a two-step method to reduce the dimension of gene expression data and aim to address the problem of high dimensionality. Methods: First, we extract a subset of genes based on statistical characteristics of their corresponding gene expression levels. Then, for further dimensionality reduction, we apply diffusion maps, which interpret the eigenfunctions of Markov matrices as a system of coordinates on the original data set, in order to obtain efficient representation of data geometric descriptions. Finally, a neural network clustering theory, fuzzy ART, is applied to the resulting data to generate clusters of cancer samples. Results: Experimental results on the small round blue-cell tumor data set, compared with other widely used clustering algorithms, such as the hierarchical clustering algorithm and K-means, show that our proposed method can effectively identify different cancer types and generate high-quality cancer sample clusters. Conclusion: The proposed feature selection methods and diffusion maps can achieve useful information from the multidimensional gene expression data and prove effective at addressing the problem of high dimensionality inherent in gene expression data analysis. (C) 2009 Elsevier BM. All rights reserved.
引用
收藏
页码:91 / 98
页数:8
相关论文
共 50 条
  • [1] Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps
    Xu, Rui
    Damelin, Steven
    Nadler, Boaz
    Wunsch, Donald C., II
    BMEI 2008: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS, VOL 1, 2008, : 245 - +
  • [2] Clustering High Dimensional Gene Expression Data via Two Step Feature Filtering
    Chen, Jianjiao
    Song, Anping
    Zhang, Wu
    2011 6TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCES AND CONVERGENCE INFORMATION TECHNOLOGY (ICCIT), 2012, : 299 - 303
  • [3] Benchmark of filter methods for feature selection in high-dimensional gene expression survival data
    Bommert, Andrea
    Welchowski, Thomas
    Schmid, Matthias
    Rahnenfuehrer, Joerg
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
  • [4] Clustering high-dimensional data via feature selection
    Liu, Tianqi
    Lu, Yu
    Zhu, Biqing
    Zhao, Hongyu
    BIOMETRICS, 2023, 79 (02) : 940 - 950
  • [5] Feature Selection in High-Dimensional Space with Applications to Gene Expression Data
    Pantha, Nishan
    Ramasubramanian, Muthukumaran
    Gurung, Iksha
    Maskey, Manil
    Sanders, Lauren M.
    Casaletto, James
    Costes, Sylvain V.
    SOUTHEASTCON 2024, 2024, : 6 - 15
  • [6] Incorporating feature ranking and evolutionary methods for the classification of high-dimensional DNA microarray gene expression data
    Abedini, Mani
    Kirley, Michael
    Chiong, Raymond
    AUSTRALASIAN MEDICAL JOURNAL, 2013, 6 (05): : 272 - 279
  • [7] Robust clustering of noisy high-dimensional gene expression data for patients subtyping
    Coretto, Pietro
    Serra, Angela
    Tagliaferri, Roberto
    BIOINFORMATICS, 2018, 34 (23) : 4064 - 4072
  • [8] Integrative clustering methods for high-dimensional molecular data
    Chalise, Prabhakar
    Koestler, Devin C.
    Bimali, Milan
    Yu, Qing
    Fridley, Brooke L.
    TRANSLATIONAL CANCER RESEARCH, 2014, 3 (03) : 202 - 216
  • [9] Comparison of the Cluster Validation Methods for High-dimensional (Gene Expression) Data
    Jeong, Yunkyoung
    Baek, Jangsun
    KOREAN JOURNAL OF APPLIED STATISTICS, 2007, 20 (01) : 167 - 181
  • [10] On online high-dimensional spherical data clustering and feature selection
    Amayri, Ola
    Bouguila, Nizar
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (04) : 1386 - 1398