Clustering methods for analyzing large data sets: Gonad development, a study case

被引:4
|
作者
Hennetin, Jerome [1 ]
Bellis, Michel [1 ]
机构
[1] CNRS, CRBM, Montpellier, France
关键词
D O I
10.1016/S0076-6879(06)11021-6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
With the development of data set repositories, it is now possible to collate high numbers of related results by gathering data from experiments carried out in different laboratories and addressing similar questions or using a single type of biological material under different conditions. To address the challenge posed by the heterogeneous nature of multiple data sources, this chapter presents several methods used routinely for assessing the quality of data (i.e., reproducibility of replicates and similarity between experimental points obtained under identical or similar biological conditions). As gene clustering on large data sets is not straightforward, this chapter also presents a rapid gene clustering method that involves translating variation profiles from an ordered set of comparisons into chains of symbols. In addition, it shows that lists of genes assembled based on the presence of a common term in their functional description can be used to find the most informative comparisons and to construct from them exemplar chains of symbols that are useful for clustering similar genes. Finally, this symbolic approach is extended to the overall set of biological conditions under study and shows how the resultant collection of variation profiles can be used to construct transcriptional networks, which in turn can be used as powerful tools for gene clustering.
引用
收藏
页码:387 / +
页数:22
相关论文
共 50 条
  • [1] A Case Study Competition Among Methods for Analyzing Large Spatial Data
    Heaton, Matthew J.
    Datta, Abhirup
    Finley, Andrew O.
    Furrer, Reinhard
    Guinness, Joseph
    Guhaniyogi, Rajarshi
    Gerber, Florian
    Gramacy, Robert B.
    Hammerling, Dorit
    Katzfuss, Matthias
    Lindgren, Finn
    Nychka, Douglas W.
    Sun, Furong
    Zammit-Mangion, Andrew
    [J]. JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS, 2019, 24 (03) : 398 - 425
  • [2] A Case Study Competition Among Methods for Analyzing Large Spatial Data
    Matthew J. Heaton
    Abhirup Datta
    Andrew O. Finley
    Reinhard Furrer
    Joseph Guinness
    Rajarshi Guhaniyogi
    Florian Gerber
    Robert B. Gramacy
    Dorit Hammerling
    Matthias Katzfuss
    Finn Lindgren
    Douglas W. Nychka
    Furong Sun
    Andrew Zammit-Mangion
    [J]. Journal of Agricultural, Biological and Environmental Statistics, 2019, 24 : 398 - 425
  • [3] Semi-supervised clustering of large data sets with kernel methods
    Fausser, Stefan
    Schwenker, Friedhelm
    [J]. PATTERN RECOGNITION LETTERS, 2014, 37 : 78 - 84
  • [4] Analyzing large data sets in cosmology
    Szalay, AS
    Matsubara, T
    [J]. STATISTICAL CHALLENGES IN ASTRONOMY, 2003, : 161 - 174
  • [5] Managing and Analyzing Large Data Sets
    Snyder, Derrick
    Burress, Brian
    [J]. 2011 FUTURE OF INSTRUMENTATION INTERNATIONAL WORKSHOP (FIIW), 2011,
  • [6] Efficient clustering of large data sets
    Ananthanarayana, VS
    Murty, MN
    Subramanian, DK
    [J]. PATTERN RECOGNITION, 2001, 34 (12) : 2561 - 2563
  • [7] THE USE OF NON-HIERARCHICAL ALLOCATION METHODS FOR CLUSTERING LARGE SETS OF DATA
    BELBIN, L
    [J]. AUSTRALIAN COMPUTER JOURNAL, 1987, 19 (01): : 32 - 41
  • [8] Analyzing Microarray Data with Classification and Clustering Methods
    Wan, Shaohua
    [J]. 2015 THIRD INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA, 2015, : 175 - 179
  • [9] Sets, bags, and rock and roll - Analyzing large data sets of network data
    McHugh, J
    [J]. COMPUTER SECURITY ESORICS 2004, PROCEEDINGS, 2004, 3193 : 407 - 422
  • [10] Bayesian nonparametric clustering for large data sets
    Zuanetti, Daiane Aparecida
    Mueller, Peter
    Zhu, Yitan
    Yang, Shengjie
    Ji, Yuan
    [J]. STATISTICS AND COMPUTING, 2019, 29 (02) : 203 - 215