Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data

被引:0
|
作者
Becquet, Celine [1 ]
Blachon, Sylvain [1 ]
Jeudy, Baptiste [2 ]
Boulicaut, Jean-Francois [2 ]
Gandrillon, Olivier [1 ]
机构
[1] Univ Lyon 1, Equipe Signalisat & Identites Cellulaires, Ctr Genet Mol & Cellulaire, CNRS,UMR 5534, F-69622 Villeurbanne, France
[2] Inst Natl Sci Appl, Lab Ingn Syst Informat, F-69621 Villeurbanne, France
来源
GENOME BIOLOGY | 2002年 / 3卷 / 12期
关键词
D O I
暂无
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: The association-rules discovery (ARD) technique has yet to be applied to gene-expression data analysis. Even in the absence of previous biological knowledge, it should identify sets of genes whose expression is correlated. The first association-rule miners appeared six years ago and proved efficient at dealing with sparse and weakly correlated data. A huge international research effort has led to new algorithms for tackling difficult contexts and these are particularly suited to analysis of large gene-expression matrices. To validate the ARD technique we have applied it to freely available human serial analysis of gene expression (SAGE) data. Results: The approach described here enables us to designate sets of strong association rules. We normalized the SAGE data before applying our association rule miner. Depending on the discretization algorithm used, different properties of the data were highlighted. Both common and specific interpretations could be made from the extracted rules. In each and every case the extracted collections of rules indicated that a very strong co-regulation of mRNA encoding ribosomal proteins occurs in the dataset. Several rules associating proteins involved in signal transduction were obtained and analyzed, some pointing to yet-unexplored directions. Furthermore, by examining a subset of these rules, we were able both to reassign a wrongly labeled tag, and to propose a function for an expressed sequence tag encoding a protein of unknown function. Conclusions: We show that ARD is a promising technique that turns out to be complementary to existing gene-expression clustering techniques.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data
    Céline Becquet
    Sylvain Blachon
    Baptiste Jeudy
    Jean-Francois Boulicaut
    Olivier Gandrillon
    Genome Biology, 3 (12)
  • [2] Analysis of large-scale gene expression data
    Sherlock, G
    CURRENT OPINION IN IMMUNOLOGY, 2000, 12 (02) : 201 - 205
  • [3] Finding regulatory modules through large-scale gene-expression data analysis
    Kloster, M
    Tang, C
    Wingreen, NS
    BIOINFORMATICS, 2005, 21 (07) : 1172 - 1179
  • [4] A modular approach for integrative analysis of large-scale gene-expression and drug-response data
    Kutalik, Zoltan
    Beckmann, Jacques S.
    Bergmann, Sven
    NATURE BIOTECHNOLOGY, 2008, 26 (05) : 531 - 539
  • [5] A modular approach for integrative analysis of large-scale gene-expression and drug-response data
    Zoltán Kutalik
    Jacques S Beckmann
    Sven Bergmann
    Nature Biotechnology, 2008, 26 : 531 - 539
  • [6] Challenges and prospects in the analysis of large-scale gene expression data
    Ihmeis, JH
    Bergmann, S
    BRIEFINGS IN BIOINFORMATICS, 2004, 5 (04) : 313 - 327
  • [7] Enhancing Power Grid Data Analysis with Fusion Algorithms for Efficient Association Rule Mining in Large-Scale Datasets
    Sun, Qiongqiong
    INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2024, 19 (03) : 1 - 15
  • [8] Boolean Association Rule Mining on Microarray Gene Expression Data
    Vengateshkumar, R.
    Alagukumar, S.
    Lawrance, R.
    ADVANCED COMPUTING AND INTELLIGENT ENGINEERING, 2020, 1082 : 99 - 111
  • [9] Statistical Learning of Large-Scale Genetic Data: How to Run a Genome-Wide Association Study of Gene-Expression Data Using the 1000 Genomes Project Data
    Anton Sugolov
    Eric Emmenegger
    Andrew D. Paterson
    Lei Sun
    Statistics in Biosciences, 2024, 16 : 250 - 264
  • [10] Statistical Learning of Large-Scale Genetic Data: How to Run a Genome-Wide Association Study of Gene-Expression Data Using the 1000 Genomes Project Data
    Sugolov, Anton
    Emmenegger, Eric
    Paterson, Andrew D.
    Sun, Lei
    STATISTICS IN BIOSCIENCES, 2024, 16 (01) : 250 - 264