Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data

被引:0
|
作者
Becquet, Celine [1 ]
Blachon, Sylvain [1 ]
Jeudy, Baptiste [2 ]
Boulicaut, Jean-Francois [2 ]
Gandrillon, Olivier [1 ]
机构
[1] Univ Lyon 1, Equipe Signalisat & Identites Cellulaires, Ctr Genet Mol & Cellulaire, CNRS,UMR 5534, F-69622 Villeurbanne, France
[2] Inst Natl Sci Appl, Lab Ingn Syst Informat, F-69621 Villeurbanne, France
来源
GENOME BIOLOGY | 2002年 / 3卷 / 12期
关键词
D O I
暂无
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: The association-rules discovery (ARD) technique has yet to be applied to gene-expression data analysis. Even in the absence of previous biological knowledge, it should identify sets of genes whose expression is correlated. The first association-rule miners appeared six years ago and proved efficient at dealing with sparse and weakly correlated data. A huge international research effort has led to new algorithms for tackling difficult contexts and these are particularly suited to analysis of large gene-expression matrices. To validate the ARD technique we have applied it to freely available human serial analysis of gene expression (SAGE) data. Results: The approach described here enables us to designate sets of strong association rules. We normalized the SAGE data before applying our association rule miner. Depending on the discretization algorithm used, different properties of the data were highlighted. Both common and specific interpretations could be made from the extracted rules. In each and every case the extracted collections of rules indicated that a very strong co-regulation of mRNA encoding ribosomal proteins occurs in the dataset. Several rules associating proteins involved in signal transduction were obtained and analyzed, some pointing to yet-unexplored directions. Furthermore, by examining a subset of these rules, we were able both to reassign a wrongly labeled tag, and to propose a function for an expressed sequence tag encoding a protein of unknown function. Conclusions: We show that ARD is a promising technique that turns out to be complementary to existing gene-expression clustering techniques.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Large-scale gene expression data analysis: A new challenge to computational biologists
    Zhang, MQ
    GENOME RESEARCH, 1999, 9 (08) : 681 - 688
  • [22] Characterization of variability in large-scale gene expression data: Implications for study design
    Novak, JP
    Sladek, R
    Hudson, TJ
    GENOMICS, 2002, 79 (01) : 104 - 113
  • [23] GENE DISCOVERY METHODS FROM LARGE-SCALE GENE EXPRESSION DATA
    Shimizu, Akifumi
    Yano, Kentaro
    QUANTUM BIO-INFORMATICS III: FROM QUANTUM INFORMATION TO BIO-INFORMATICS, 2010, 26 : 489 - +
  • [24] CressExpress: A tool for large-scale mining of expression data from Arabidopsis
    Srinivasasainagendra, Vinodh
    Page, Grier P.
    Mehta, Tapan
    Coulibaly, Issa
    Loraine, Ann E.
    PLANT PHYSIOLOGY, 2008, 147 (03) : 1004 - 1016
  • [25] Automated Protocol for Large-Scale Modeling of Gene Expression Data
    Hall, Michelle Lynn
    Calkins, David
    Sherman, Woody
    Journal of Chemical Information and Modeling, 2016, 56 (11) : 2216 - 2224
  • [26] Large scale data mining approach for gene-specific standardization of microarray gene expression data
    Yoon, Sukjoon
    Yang, Young
    Choi, Jiwon
    Seong, Jeeweon
    BIOINFORMATICS, 2006, 22 (23) : 2898 - 2904
  • [27] Various viewpoints analysis of the actual and large-scale data by using the data mining technique
    Tamura, K
    Matsuura, K
    Imai, F
    39TH ANNUAL 2005 INTERNATIONAL CARNAHAN CONFERENCE ON SECURITY TECHNOLOGY, PROCEEDINGS, 2005, : 283 - 286
  • [28] Cluster analysis of large scale gene expression data
    Erb, RS
    Michaels, GS
    DIMENSION REDUCTION, COMPUTATIONAL COMPLEXITY AND INFORMATION, 1998, 30 : 303 - 308
  • [29] MINING DENSE DATA: ASSOCIATION RULE DISCOVERY ON BENCHMARK CASE STUDY
    Abu Bakar, Wan Aezwani Wan
    Saman, Md. Yazid Md.
    Abdullah, Zailani
    Abd Jalil, Masita Masila
    Herawan, Tutut
    JURNAL TEKNOLOGI, 2016, 78 (2-2): : 131 - 135
  • [30] An Association Rule on eDisiplin Case Study: An Educational Data Mining Approach
    Man, Mustafa
    Abu Bakar, Wan Aezwani Wan
    Sabri, Ily Amalina Ahmad
    ADVANCED SCIENCE LETTERS, 2018, 24 (03) : 1872 - 1875