Provable Algorithms for Inference in Topic Models

被引:0
|
作者
Arora, Sanjeev [1 ]
Ge, Rong [2 ]
Koehler, Frederic [3 ]
Ma, Tengyu [1 ]
Moitra, Ankur [4 ,5 ]
机构
[1] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA
[2] Duke Univ, Comp Sci Dept, Durham, NC 27706 USA
[3] Princeton Univ, Dept Math, Princeton, NJ 08544 USA
[4] MIT, Dept Math, Cambridge, MA 02139 USA
[5] MIT, CSAIL, Cambridge, MA 02139 USA
关键词
LASSO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, there has been considerable progress on designing algorithms with provable guarantees - typically using linear algebraic methods - for parameter learning in latent variable models. But designing provable algorithms for inference has proven to be more challenging. Here we take a first step towards provable inference in topic models. We leverage a property of topic models that enables us to construct simple linear estimators for the unknown topic proportions that have small variance, and consequently can work with short documents. Our estimators also correspond to finding an estimate around which the posterior is well-concentrated. We show lower bounds that for shorter documents it can be information theoretically impossible to find the hidden topics. Finally, we give empirical results that demonstrate that our algorithm works on realistic topic models. It yields good solutions on synthetic data and runs in time comparable to a single iteration of Gibbs sampling.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Bayesian Inference in Linear Models With a Random Gaussian Matrix: Algorithms and Complexity
    Nevat, Ido
    Peters, Gareth W.
    Yuan, Jinhong
    2008 IEEE 19TH INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS, 2008, : 180 - 185
  • [42] Provable Lipschitz Certification for Generative Models
    Jordan, Matt
    Dimakis, Alexandros G.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [43] WARD: PROVABLE RAG DATASET INFERENCE VIA LLM WATERMARKS
    Jovanović, Nikola
    Staab, Robin
    Baader, Maximilian
    Vechev, Martin
    arXiv,
  • [44] OSPREY: Protein Design with Ensembles, Flexibility, and Provable Algorithms
    Gainza, Pablo
    Roberts, Kyle E.
    Georgiev, Ivelin
    Lilien, Ryan H.
    Keedy, Daniel A.
    Chen, Cheng-Yu
    Reza, Faisal
    Anderson, Amy C.
    Richardson, David C.
    Richardson, Jane S.
    Donald, Bruce R.
    METHODS IN PROTEIN DESIGN, 2013, 523 : 87 - 107
  • [45] Incorporating topic transition in topic detection and tracking algorithms
    Zeng, Jianping
    Zhang, Shiyong
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (01) : 227 - 232
  • [46] Temporal Topic Inference for Trend Prediction
    Aghababaei, Somayyeh
    Makrehchi, Masoud
    2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2015, : 877 - 884
  • [47] Bayesian Topic Regression for Causal Inference
    Ahrens, Maximilian
    Ashwin, Julian
    Calliess, Jan-Peter
    Vu Nguyen
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 8162 - 8188
  • [48] Nonparametric Topic Modeling with Neural Inference
    Ning, Xuefei
    Zheng, Yin
    Jiang, Zhuxi
    Wang, Yu
    Yang, Huazhong
    Huang, Junzhou
    Zhao, Peilin
    NEUROCOMPUTING, 2020, 399 (399) : 296 - 306
  • [49] Topic Models with Topic Ordering Regularities for Topic Segmentation
    Du, Lan
    Pate, John K.
    Johnson, Mark
    2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2014, : 803 - 808
  • [50] A Study of Entity Relationship Extraction Algorithms Based on Symmetric Interaction between Data, Models, and Inference Algorithms
    Feng, Ping
    Su, Nannan
    Xing, Jiamian
    Bian, Jing
    Ouyang, Dantong
    APPLIED SCIENCES-BASEL, 2024, 14 (03):