Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps

被引:27
|
作者
Xu, Rui [1 ]
Damelin, Steven [2 ]
Nadler, Boaz [3 ]
Wunsch, Donald C., II [1 ]
机构
[1] Missouri Univ Sci & Technol, Dept Elect & Comp Engn, Appl Computat Intelligence Lab, Rolla, MO 65409 USA
[2] Georgia So Univ, Dept Math Sci, Statesboro, GA 30460 USA
[3] Weizmann Inst Sci, Dept Appl Math & Comp Sci, IL-76100 Rehovot, Israel
基金
美国国家科学基金会;
关键词
Clustering; Diffusion maps; Feature filtering; Fuzzy ART; Gene expression profiles; MULTICLASS CANCER CLASSIFICATION; MOLECULAR CLASSIFICATION; FEATURE-SELECTION; PREDICTION; PATTERNS;
D O I
10.1016/j.artmed.2009.06.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Objective: The importance of gene expression data in cancer diagnosis and treatment has become widely known by cancer researchers in recent years. However, one of the major challenges in the computational analysis of such data is the curse of dimensionality because of the overwhelming number of variables measured (genes) versus the small number of samples. Here, we use a two-step method to reduce the dimension of gene expression data and aim to address the problem of high dimensionality. Methods: First, we extract a subset of genes based on statistical characteristics of their corresponding gene expression levels. Then, for further dimensionality reduction, we apply diffusion maps, which interpret the eigenfunctions of Markov matrices as a system of coordinates on the original data set, in order to obtain efficient representation of data geometric descriptions. Finally, a neural network clustering theory, fuzzy ART, is applied to the resulting data to generate clusters of cancer samples. Results: Experimental results on the small round blue-cell tumor data set, compared with other widely used clustering algorithms, such as the hierarchical clustering algorithm and K-means, show that our proposed method can effectively identify different cancer types and generate high-quality cancer sample clusters. Conclusion: The proposed feature selection methods and diffusion maps can achieve useful information from the multidimensional gene expression data and prove effective at addressing the problem of high dimensionality inherent in gene expression data analysis. (C) 2009 Elsevier BM. All rights reserved.
引用
收藏
页码:91 / 98
页数:8
相关论文
共 50 条
  • [41] A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data
    Song, Qinbao
    Ni, Jingjie
    Wang, Guangtao
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (01) : 1 - 14
  • [42] High-dimensional data clustering using k-means subspace feature selection
    Wang, Xiao-Dong
    Chen, Rung-Ching
    Yan, Fei
    Journal of Network Intelligence, 2019, 4 (03): : 80 - 87
  • [43] Clustering-based Acceleration for High-dimensional Gaussian Filtering
    Oishi, Sou
    Fukushima, Norishige
    PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MULTIMEDIA APPLICATIONS (SIGMAP), 2021, : 65 - 72
  • [44] An effective clustering scheme for high-dimensional data
    He, Xuansen
    He, Fan
    Fan, Yueping
    Jiang, Lingmin
    Liu, Runzong
    Maalla, Allam
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 45001 - 45045
  • [45] Approximated clustering of distributed high-dimensional data
    Kriegel, HP
    Kunath, P
    Pfeifle, M
    Renz, M
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2005, 3518 : 432 - 441
  • [46] Clustering High-Dimensional Noisy Categorical Data
    Tian, Zhiyi
    Xu, Jiaming
    Tang, Jen
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (548) : 3008 - 3019
  • [47] Subspace selection for clustering high-dimensional data
    Baumgartner, C
    Plant, C
    Kailing, K
    Kriegel, HP
    Kröger, P
    FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 11 - 18
  • [48] The Role of Hubness in Clustering High-Dimensional Data
    Tomasev, Nenad
    Radovanovic, Milos
    Mladenic, Dunja
    Ivanovic, Mirjana
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (03) : 739 - 751
  • [49] An Initialization Method for Clustering High-Dimensional Data
    Chen, Luying
    Chen, Lifei
    Jiang, Qingshan
    Wang, Beizhan
    Shi, Liang
    FIRST INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, : 444 - +
  • [50] Clustering of imbalanced high-dimensional media data
    Brodinova, Sarka
    Zaharieva, Maia
    Filzmoser, Peter
    Ortner, Thomas
    Breiteneder, Christian
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2018, 12 (02) : 261 - 284