Extracting gene expression patterns and identifying co-expressed genes from microarray data reveals biologically responsive processes

被引:42
|
作者
Chou, Jeff W. [1 ]
Zhou, Tong [2 ,3 ]
Kaufmann, William K. [2 ,3 ]
Paules, Richard S. [1 ]
Bushel, Pierre R. [1 ]
机构
[1] Natl Inst Environm Hlth Sci, Microarray Grp, Res Triangle Pk, NC 27709 USA
[2] Univ N Carolina, Ctr Environm Hlth & Susceptibil, Dept Pathol & Lab Med, Chapel Hill, NC 27599 USA
[3] Univ N Carolina, Lineberger Comprehens Canc Ctr, Chapel Hill, NC 27599 USA
关键词
D O I
10.1186/1471-2105-8-427
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: A common observation in the analysis of gene expression data is that many genes display similarity in their expression patterns and therefore appear to be co-regulated. However, the variation associated with microarray data and the complexity of the experimental designs make the acquisition of co-expressed genes a challenge. We developed a novel method for Extracting microarray gene expression Patterns and Identifying co-expressed Genes, designated as EPIG. The approach utilizes the underlying structure of gene expression data to extract patterns and identify co-expressed genes that are responsive to experimental conditions. Results: Through evaluation of the correlations among profiles, the magnitude of variation in gene expression profiles, and profile signal-to-noise ratio's, EPIG extracts a set of patterns representing co-expressed genes. The method is shown to work well with a simulated data set and microarray data obtained from time-series studies of dauer recovery and LI starvation in C. elegans and after ultraviolet (UV) or ionizing radiation (IR)-induced DNA damage in diploid human fibroblasts. With the simulated data set, EPIG extracted the appropriate number of patterns which were more stable and homogeneous than the set of patterns that were determined using the CLICK or CAST clustering algorithms. However, CLICK performed better than EPIG and CAST with respect to the average correlation between clusters/patterns of the simulated data. With real biological data, EPIG extracted more dauer-specific patterns than CLICK. Furthermore, analysis of the IR/UV data revealed 18 unique patterns and 2661 genes out of approximately 17,000 that were identified as significantly expressed and categorized to the patterns by EPIG. The time-dependent patterns displayed similar and dissimilar responses between IR and UV treatments. Gene Ontology analysis applied to each pattern-related subset of co-expressed genes revealed underlying biological processes affected by IR-and/or UV-induced DNA damage. Conclusion: EPIG competed with CLICK and performed better than CAST in extracting patterns from simulated data. EPIG extracted more biological informative patterns and co-expressed genes from both C. elegans and IR/UV-treated human fibroblasts. Using Gene Ontology analysis of the genes in the patterns extracted by EPIG, several key biological categories related to p53-dependent cell cycle control were revealed from the IR/UV data. Among them were mitotic cell cycle, DNA replication, DNA repair, cell cycle checkpoint, and G(0)-like status transition. EPIG can be applied to data sets from a variety of experimental designs.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Generation of patterns from gene expression data by assigning confidence to differentially expressed genes
    Manduchi, E
    Grant, GR
    McKenzie, SE
    Overton, GC
    Surrey, S
    Stoeckert, CJ
    [J]. BIOINFORMATICS, 2000, 16 (08) : 685 - 698
  • [22] Mining biologically significant co-regulation patterns from microarray data
    Zhao, Yuhai
    Yin, Ying
    Wang, Guoren
    [J]. ROUGH SETS AND KNOWLEDGE TECHNOLOGY, PROCEEDINGS, 2006, 4062 : 408 - 414
  • [23] Significance analysis and improved discovery of disease-specific Differentially Co-expressed Gene Sets in microarray data
    Li, Haixia
    Karuturi, R. Krishna Murthy
    [J]. INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2010, 4 (06) : 617 - 638
  • [24] The CesA gene family of barley. Quantitative analysis of transcripts reveals two groups of co-expressed genes
    Burton, RA
    Shirley, NJ
    King, BJ
    Harvey, AJ
    Fincher, GB
    [J]. PLANT PHYSIOLOGY, 2004, 134 (01) : 224 - 236
  • [25] DCGL: an R package for identifying differentially coexpressed genes and links from gene expression microarray data
    Liu, Bao-Hong
    Yu, Hui
    Tu, Kang
    Li, Chun
    Li, Yi-Xue
    Li, Yuan-Yuan
    [J]. BIOINFORMATICS, 2010, 26 (20) : 2637 - 2638
  • [26] ATTED-II:: a database of co-expressed genes and cis elements for identifying co-regulated gene groups in Arabidopsis
    Obayashi, Takeshi
    Kinoshita, Kengo
    Nakai, Kenta
    Shibaoka, Masayuki
    Hayashi, Shinpei
    Saeki, Motoshi
    Shibata, Daisuke
    Saito, Kazuki
    Ohta, Hiroyuki
    [J]. NUCLEIC ACIDS RESEARCH, 2007, 35 : D863 - D869
  • [27] Reconstruction of gene co-expression network from microarray data using local expression patterns
    Roy, Swarup
    Bhattacharyya, Dhruba K.
    Kalita, Jugal K.
    [J]. BMC BIOINFORMATICS, 2014, 15 : 1 - 14
  • [28] Reconstruction of gene co-expression network from microarray data using local expression patterns
    Swarup Roy
    Dhruba K Bhattacharyya
    Jugal K Kalita
    [J]. BMC Bioinformatics, 15
  • [29] A gene family encoding RING finger proteins in rice: their expansion, expression diversity, and co-expressed genes
    Sung Don Lim
    Won Cheol Yim
    Jun-Cheol Moon
    Dong Sub Kim
    Byung-Moo Lee
    Cheol Seong Jang
    [J]. Plant Molecular Biology, 2010, 72 : 369 - 380
  • [30] A gene family encoding RING finger proteins in rice: their expansion, expression diversity, and co-expressed genes
    Lim, Sung Don
    Yim, Won Cheol
    Moon, Jun-Cheol
    Kim, Dong Sub
    Lee, Byung-Moo
    Jang, Cheol Seong
    [J]. PLANT MOLECULAR BIOLOGY, 2010, 72 (4-5) : 369 - 380