Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests

被引:0
|
作者
Edwards, David [1 ]
de Abreu, Gabriel C. G. [1 ]
Labouriau, Rodrigo [1 ]
机构
[1] Aarhus Univ, Fac Agr Sci, Inst Genet & Biotechnol, Aarhus, Denmark
来源
BMC BIOINFORMATICS | 2010年 / 11卷
关键词
MAXIMUM-LIKELIHOOD; NETWORKS; RECONSTRUCTION; TREES;
D O I
10.1186/1471-2105-11-18
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Chow and Liu showed that the maximum likelihood tree for multivariate discrete distributions may be found using a maximum weight spanning tree algorithm, for example Kruskal's algorithm. The efficiency of the algorithm makes it tractable for high-dimensional problems. Results: We extend Chow and Liu's approach in two ways: first, to find the forest optimizing a penalized likelihood criterion, for example AIC or BIC, and second, to handle data with both discrete and Gaussian variables. We apply the approach to three datasets: two from gene expression studies and the third from a genetics of gene expression study. The minimal BIC forest supplements a conventional analysis of differential expression by providing a tentative network for the differentially expressed genes. In the genetics of gene expression context the method identifies a network approximating the joint distribution of the DNA markers and the gene expression levels. Conclusions: The approach is generally useful as a preliminary step towards understanding the overall dependence structure of high-dimensional discrete and/or continuous data. Trees and forests are unrealistically simple models for biological systems, but can provide useful insights. Uses include the following: identification of distinct connected components, which can be analysed separately (dimension reduction); identification of neighbourhoods for more detailed analyses; as initial models for search algorithms with a larger search space, for example decomposable models or Bayesian networks; and identification of interesting features, such as hub nodes.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Inference and Verification of Probabilistic Graphical Models from High-Dimensional Data
    Ma, Yinjiao
    Damazyn, Kevin
    Klinger, Jakob
    Gong, Haijun
    [J]. DATA INTEGRATION IN THE LIFE SCIENCES, DILS 2015, 2015, 9162 : 223 - 239
  • [32] High-dimensional Gaussian graphical models on network-linked data
    Li, Tianxi
    Qian, Cheng
    Levina, Elizaveta
    Zhu, Ji
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
  • [33] Graphical Comparison of High-Dimensional Distributions
    Modarres, Reza
    [J]. INTERNATIONAL STATISTICAL REVIEW, 2020, 88 (03) : 698 - 714
  • [34] High-dimensional generalized linear models incorporating graphical structure among predictors
    Zhou, Shengbin
    Zhou, Jingke
    Zhang, Bo
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2019, 13 (02): : 3161 - 3194
  • [35] Graphical Local Genetic Algorithm for High-Dimensional Log-Linear Models
    Roach, Lyndsay
    Gao, Xin
    [J]. MATHEMATICS, 2023, 11 (11)
  • [36] A two-step method for estimating high-dimensional Gaussian graphical models
    Yuehan Yang
    Ji Zhu
    [J]. Science China Mathematics, 2020, 63 (06) : 1203 - 1218
  • [37] Sequential change-point detection in high-dimensional Gaussian graphical models
    Keshavarz, Hossein
    Michailidis, George
    Atchade, Yves
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
  • [38] Direct estimation of differential networks under high-dimensional nonparanormal graphical models
    Zhang, Qingyang
    [J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2020, 48 (02): : 187 - 203
  • [39] REGULARIZED RANK-BASED ESTIMATION OF HIGH-DIMENSIONAL NONPARANORMAL GRAPHICAL MODELS
    Xue, Lingzhou
    Zou, Hui
    [J]. ANNALS OF STATISTICS, 2012, 40 (05): : 2541 - 2571
  • [40] An Equivalent Measure of Partial Correlation Coefficients for High-Dimensional Gaussian Graphical Models
    Liang, Faming
    Song, Qifan
    Qiu, Peihua
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2015, 110 (511) : 1248 - 1265