Sparse Parallel Training of Hierarchical Dirichlet Process Topic Models

被引:0
|
作者
Terenin, Alexander [1 ]
Magnusson, Mans [2 ,3 ]
Jonsson, Leif [4 ,5 ]
机构
[1] Imperial Coll London, London, England
[2] Uppsala Univ, Uppsala, Sweden
[3] Aalto Univ, Espoo, Finland
[4] Ericsson AB, Stockholm, Sweden
[5] Linkoping Univ, Linkoping, Sweden
基金
芬兰科学院; 瑞典研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To scale non-parametric extensions of probabilistic topic models such as Latent Dirichlet allocation to larger data sets, practitioners rely increasingly on parallel and distributed systems. In this work, we study data-parallel training for the hierarchical Dirichlet process (HDP) topic model. Based upon a representation of certain conditional distributions within an HDP, we propose a doubly sparse data-parallel sampler for the HDP topic model. This sampler utilizes all available sources of sparsity found in natural language-an important way to make computation efficient. We benchmark our method on a well-known corpus (PubMed) with 8m documents and 768m tokens, using a single multi-core machine in under four days.
引用
收藏
页码:2925 / 2934
页数:10
相关论文
共 50 条
  • [1] Online Sparse Collapsed Hybrid Variational-Gibbs Algorithm for Hierarchical Dirichlet Process Topic Models
    Burkhardt, Sophie
    Kramer, Stefan
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT II, 2017, 10535 : 189 - 204
  • [3] Hierarchical topic modeling with nested hierarchical Dirichlet process
    Yi-qun Ding
    Shan-ping Li
    Zhen Zhang
    Bin Shen
    [J]. Journal of Zhejiang University-SCIENCE A, 2009, 10 : 858 - 867
  • [4] Hierarchical topic modeling with nested hierarchical Dirichlet process
    Ding, Yi-qun
    Li, Shan-ping
    Zhang, Zhen
    Shen, Bin
    [J]. JOURNAL OF ZHEJIANG UNIVERSITY-SCIENCE A, 2009, 10 (06): : 858 - 867
  • [5] Flow Hierarchical Dirichlet Process for Complex Topic Modeling
    Han Z.-M.
    Zhang M.-M.
    Li M.-Q.
    Duan D.-G.
    Chen Y.
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2019, 42 (07): : 1539 - 1552
  • [6] Scalable Training of Hierarchical Topic Models
    Chen, Jianfei
    Zhu, Jun
    Lu, Jie
    Liu, Shixia
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (07): : 826 - 839
  • [7] Online Topic Evolution Modeling Based on Hierarchical Dirichlet Process
    Ma, Tao
    Qu, Dacheng
    Ma, Rui
    Feng, Wei
    Li, Kan
    [J]. 2016 IEEE FIRST INTERNATIONAL CONFERENCE ON DATA SCIENCE IN CYBERSPACE (DSC 2016), 2016, : 400 - 405
  • [8] Topic model for graph mining based on hierarchical Dirichlet process
    Zhang, Haibin
    Huating, Shang
    Wu, Xianyi
    [J]. STATISTICAL THEORY AND RELATED FIELDS, 2020, 4 (01) : 66 - 77
  • [9] Truncated Poisson–Dirichlet approximation for Dirichlet process hierarchical models
    Junyi Zhang
    Angelos Dassios
    [J]. Statistics and Computing, 2023, 33
  • [10] Integration of Knowledge Graph Embedding into Topic Modeling with Hierarchical Dirichlet Process
    Li, Dingcheng
    Dadaneh, Siamak Zamani
    Zhang, Jingyuan
    Li, Ping
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 940 - 950