Discovery of Biomarker Genes from Earthworm Microarray Data by Discriminant Analysis and Clustering

被引:1
|
作者
Li, Ying [1 ]
Wang, Nan [1 ]
Zhang, Chaoyang [1 ]
Perkins, Edward J. [2 ]
Gong, Ping [3 ]
机构
[1] Univ So Mississippi, Hattiesburg, MS 39401 USA
[2] US Army Engn Res & Dev Ctr, Vicksburg, MS 39180 USA
[3] SpecPro Inc, Vicksburg, MS 39180 USA
关键词
Biomarker; Classification; Decision tree; Support vector machine; Clustering; Earthworm Microarray; CANCER CLASSIFICATION;
D O I
10.1109/IJCBS.2009.134
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Monitoring, assessment and prediction of environmental risks that chemicals pose demand rapid and accurate diagnostic assays. One important goal of microarray experiments is to discover novel biomarkers for toxicity evaluation. A variety of toxicological effects have been associated with explosive compounds 2,4,6-trinitrotoluene (TNT) and 1,3.5-trinitro-1,3,5-triazacyclohexane (RDX). Here we developed a discriminant analysis and cluster (DAC) pipeline to analyze a 248-array dataset with 15,208 non-redundant earthworm (Eisenia fetida) gene probes on each array. Our objective was to identify biomarker genes that can separate earthworm samples into three groups: control (untreated), TNT-treated, and RDX-treated. First, the class comparison statistical algorithm implemented in BRB-ArrayTools was used to infer a total of 869 genes that significantly changed relative to controls as a result of exposure to TNT or RDX at various concentrations for 4 or 14 days. Then, nine tree-based supervised machine learning algorithms were applied to generate classification rules and a set of 286 classifier genes. These classifier genes were ranked by their overall weight of significance in the nine classification methods, and were used to build support vector machines (SVM). A SVM containing all 286 classifier genes had the highest classification accuracy (91.5%). Results of unsupervised clustering show that the use of the top 100 classifier genes can assign the largest number of the 248 worm samples into the three reference clusters obtained by using all the 14,188 filtered genes, suggesting that these top-ranked genes may be potential candidates for biomarkers. This study demonstrates that the DAC pipeline can be used to identify a small set of biomarker genes from high dimensional datasets and generate a reliable SVM classification model for multiple classes.
引用
收藏
页码:23 / +
页数:2
相关论文
共 50 条
  • [1] Biomarker discovery using 1-norm regularization for multiclass earthworm microarray gene expression data
    Nan, Xiaofei
    Wang, Nan
    Gong, Ping
    Zhang, Chaoyang
    Chen, Yixin
    Wilkins, Dawn
    NEUROCOMPUTING, 2012, 92 : 36 - 43
  • [2] ICA-based clustering of genes from microarray expression data
    Lee, SI
    Batzoglou, S
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 16, 2004, 16 : 675 - 682
  • [3] Discriminant Analysis Methods for Microarray Data Classification
    Chen, Chuanliang
    Gong, Yun-Chao
    Bie, Rongfang
    AI 2008: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2008, 5360 : 268 - +
  • [4] Simultaneous classification and feature clustering using discriminant vector quantization with applications to microarray data analysis
    Li, J
    Zha, HY
    CSB2002: IEEE COMPUTER SOCIETY BIOINFORMATICS CONFERENCE, 2002, : 246 - 255
  • [5] Discriminant Cuts for Data Clustering and Analysis
    Chen, Weifu
    Feng, Guocan
    Liu, Zhiyong
    2011 FIRST ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR), 2011, : 120 - 124
  • [6] Data-Fusion in Clustering Microarray Data: Balancing Discovery and Interpretability
    Kustra, Rafal
    Zagdanski, Adam
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2010, 7 (01) : 50 - 63
  • [7] ILRC: a hybrid biomarker discovery algorithm based on improved L1 regularization and clustering in microarray data
    Kun Yu
    Weidong Xie
    Linjie Wang
    Wei Li
    BMC Bioinformatics, 22
  • [8] ILRC: a hybrid biomarker discovery algorithm based on improved L1 regularization and clustering in microarray data
    Yu, Kun
    Xie, Weidong
    Wang, Linjie
    Li, Wei
    BMC BIOINFORMATICS, 2021, 22 (01)
  • [9] Common Subcluster Mining in Microarray Data for Molecular Biomarker Discovery
    Sadhu, Arnab
    Bhattacharyya, Balaram
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2019, 11 (03) : 348 - 359
  • [10] Biomarker discovery in microarray gene expression data with Gaussian processes
    Chu, W
    Ghahramani, Z
    Falciani, F
    Wild, DL
    BIOINFORMATICS, 2005, 21 (16) : 3385 - 3393