Momentum Online LDA for Large-scale Datasets

被引:2
|
作者
Ouyang, Jihong [1 ]
Lu, You [1 ]
Li, Ximing [1 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun, Peoples R China
关键词
TERM;
D O I
10.3233/978-1-61499-419-0-1075
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modeling large-scale document collections is a significant direction in machine learning research. Online LDA uses stochastic gradient optimization technology to speed the convergence; however the large noise of stochastic gradients leads to slower convergence and worse performance. In this paper, we employ the momentum term to smooth out the noise of stochastic gradients, and propose an extension of Online LDA, namely Momentum Online LDA (MOLDA). We collect a large-scale corpus consisting of 2M documents to evaluate our model. Experimental results indicate that MOLDA achieves faster convergence and better performance than the state-of-the-art.
引用
收藏
页码:1075 / 1076
页数:2
相关论文
共 50 条
  • [1] Fast Supervised LDA for Discovering Micro-Events in Large-Scale Video Datasets
    Katharopoulos, Angelos
    Paschalidou, Despoina
    Diou, Christos
    Delopoulos, Anastasios
    MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, : 332 - 336
  • [2] Large-scale response-aware online ANN search in dynamic datasets
    Andrade, Guilherme
    Barreiros Jr, Willian
    Rocha, Leonardo
    Ferreira, Renato
    Teodoro, George
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (03): : 3499 - 3519
  • [3] Visualization of large-scale trajectory datasets
    Zachar, Gergely
    2023 CYBER-PHYSICAL SYSTEMS AND INTERNET-OF-THINGS WEEK, CPS-IOT WEEK WORKSHOPS, 2023, : 152 - 157
  • [4] Learning to Index in Large-Scale Datasets
    Prayoonwong, Amorntip
    Wang, Cheng-Hsien
    Chiu, Chih-Yi
    MULTIMEDIA MODELING, MMM 2018, PT I, 2018, 10704 : 305 - 316
  • [5] Online graph regularized non-negative matrix factorization for large-scale datasets
    Liu, Fudong
    Yang, Xuejun
    Guan, Naiyang
    Yi, Xiaodong
    NEUROCOMPUTING, 2016, 204 : 162 - 171
  • [6] LDA*: A Robust and Large-scale Topic Modeling System
    Yu, Lele
    Zhang, Ce
    Shao, Yingxia
    Cui, Bin
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (11): : 1406 - 1417
  • [7] Distributing the Stochastic Gradient Sampler for Large-Scale LDA
    Yang, Yuan
    Chen, Jianfei
    Zhu, Jun
    KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 1975 - 1984
  • [8] CuLDA: Solving Large-scale LDA Problems on GPUs
    Xie, Xiaolong
    Liang, Yun
    Li, Xiuhong
    Tan, Wei
    HPDC'19: PROCEEDINGS OF THE 28TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, 2019, : 195 - 205
  • [9] MedDialog: Large-scale Medical Dialogue Datasets
    Zeng, Guangtao
    Yang, Wenmian
    Ju, Zeqian
    Yang, Yue
    Wang, Sicheng
    Zhang, Ruisi
    Zhou, Meng
    Zeng, Jiaqi
    Dong, Xiangyu
    Zhang, Ruoyu
    Fang, Hongchao
    Zhu, Penghui
    Chen, Shu
    Xie, Pengtao
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 9241 - 9250
  • [10] Towards algorithmic analytics for large-scale datasets
    Bzdok, Danilo
    Nichols, Thomas E.
    Smith, Stephen M.
    NATURE MACHINE INTELLIGENCE, 2019, 1 (07) : 296 - 306