Distributed MCMC Inference in Dirichlet Process Mixture Models Using Julia

被引:7
|
作者
Dinari, Or [1 ]
Yu, Angel [2 ]
Freifeld, Oren [1 ]
Fisher, John W., III [2 ]
机构
[1] Ben Gurion Univ Negev, Dept Comp Sci, Beer Sheva, Israel
[2] MIT, CSAIL, 77 Massachusetts Ave, Cambridge, MA 02139 USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/CCGRID.2019.00066
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Due to the increasing availability of large data sets, the need for general-purpose massively-parallel analysis tools become ever greater. In unsupervised learning, Bayesian nonparametric mixture models, exemplified by the Dirichlet-Process Mixture Model (DPMM), provide a principled Bayesian approach to adapt model complexity to the data. Despite their potential, however, DPMMs have yet to become a popular tool. This is partly due to the lack of friendly software tools that can handle large datasets efficiently. Here we show how, using Julia, one can achieve efficient and easily-modifiable implementation of distributed inference in DPMMs. Particularly, we show how a recent parallel MCMC inference algorithm originally implemented in C++ for a single multi-core machine can be distributed efficiently across multiple multi-core machines using a distributed-memory model. This leads to speedups, alleviates memory and storage limitations, and lets us learn DPMMs from significantly larger datasets and of higher dimensionality. It also turned out that even on a single machine the proposed Julia implementation handles higher dimensions more gracefully (at least for Gaussians) than the original C++ implementation. Finally, we use the proposed implementation to learn a model of image patches and apply the learned model for image denoising. While we speculate that a highly-optimized distributed implementation in, say, C++ could have been faster than the proposed implementation in Julia, from our perspective as machine-learning researchers (as opposed to HPC researchers), the latter also offers a practical and monetary value due to the ease of development and abstraction level.
引用
收藏
页码:518 / 525
页数:8
相关论文
共 50 条
  • [1] Distributed Inference for Dirichlet Process Mixture Models
    Ge, Hong
    Chen, Yutian
    Wan, Moquan
    Ghahramani, Zoubin
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 2276 - 2284
  • [2] Performance Comparison of Julia Distributed Implementations of Dirichlet Process Mixture Models
    Huang, Ruizhu
    Xu, Weijia
    Wang, Yinzhi
    Liverani, Silvia
    Stapleton, Ann E.
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 3350 - 3354
  • [3] Fast Bayesian Inference in Dirichlet Process Mixture Models
    Wang, Lianming
    Dunson, David B.
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2011, 20 (01) : 196 - 216
  • [4] MCMC Sampling Estimation of Poisson-Dirichlet Process Mixture Models
    Qiu, Xiang
    Yuan, Linlin
    Zhou, Xueqin
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
  • [5] Two Alternative Criteria for a Split-Merge MCMC on Dirichlet Process Mixture Models
    Hosino, Tikara
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, PT II, 2017, 10614 : 672 - 679
  • [6] Scalable Estimation of Dirichlet Process Mixture Models on Distributed Data
    Wang, Ruohui
    Lin, Dahua
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4632 - 4639
  • [7] Enhancing Discovered Process Models Using Bayesian Inference and MCMC
    Janssenswillen, Gert
    Depaire, Benoit
    Faes, Christel
    [J]. BUSINESS PROCESS MANAGEMENT WORKSHOPS, BPM 2020 INTERNATIONAL WORKSHOPS, 2020, 397 : 295 - 307
  • [8] Interpretation and inference in mixture models: Simple MCMC works
    Geweke, John
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 51 (07) : 3529 - 3550
  • [9] Stylometric analyses using Dirichlet process mixture models
    Gill, Paramjit S.
    Swartz, Tim B.
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2011, 141 (11) : 3665 - 3674
  • [10] Adaptive Low-Complexity Sequential Inference for Dirichlet Process Mixture Models
    Tsiligkaridis, Theodoros
    Forsythe, Keith W.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28