Momentum Online LDA for Large-scale Datasets

被引:2
|
作者
Ouyang, Jihong [1 ]
Lu, You [1 ]
Li, Ximing [1 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun, Peoples R China
关键词
TERM;
D O I
10.3233/978-1-61499-419-0-1075
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modeling large-scale document collections is a significant direction in machine learning research. Online LDA uses stochastic gradient optimization technology to speed the convergence; however the large noise of stochastic gradients leads to slower convergence and worse performance. In this paper, we employ the momentum term to smooth out the noise of stochastic gradients, and propose an extension of Online LDA, namely Momentum Online LDA (MOLDA). We collect a large-scale corpus consisting of 2M documents to evaluate our model. Experimental results indicate that MOLDA achieves faster convergence and better performance than the state-of-the-art.
引用
收藏
页码:1075 / 1076
页数:2
相关论文
共 50 条
  • [21] Comprehensive comparison of large-scale tissue expression datasets
    Santos, Alberto
    Tsafou, Kalliopi
    Stolte, Christian
    Pletscher-Frankild, Sune
    O'Donoghue, Sean I.
    Jensen, Lars Juhl
    PEERJ, 2015, 3
  • [22] GUILD - A Generator for Usable Images in Large-Scale Datasets
    Roch, Peter
    Nejad, Bijan Shahbaz
    Handte, Marcus
    Marron, Pedro Jose
    ADVANCES IN VISUAL COMPUTING, ISVC 2022, PT II, 2022, 13599 : 245 - 258
  • [23] A Distributed Approach for Parsing Large-scale OWL Datasets
    Mohamed, Heba
    Fathalla, Said
    Lehmann, Jens
    Jabeen, Hajira
    PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KEOD), VOL 2, 2020, : 227 - 234
  • [24] Face Retrieval in Large-Scale News Video Datasets
    Thanh Duc Ngo
    Hung Thanh Vu
    Duy-Dinh Le
    Satoh, Shin'ichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (08): : 1811 - 1825
  • [25] Parallel Framework for Dimensionality Reduction of Large-Scale Datasets
    Samudrala, Sai Kiranmayee
    Zola, Jaroslaw
    Aluru, Srinivas
    Ganapathysubramanian, Baskar
    SCIENTIFIC PROGRAMMING, 2015, 2015
  • [26] Will Large-scale Generative Models Corrupt Future Datasets?
    Hataya, Ryuichiro
    Bao, Han
    Arai, Hiromi
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20498 - 20508
  • [27] Large-scale palm vein recognition on synthetic datasets
    Hernandez-Garcia, Ruber
    Santamaria, Jose, I
    Barrientos, Ricardo J.
    Salazar Jurado, Edwin H.
    Manuel Castro, Francisco
    Ramos-Cozar, Julian
    Guil, Nicolas
    2021 40TH INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY (SCCC), 2021,
  • [28] Scalable Iterative Classification for Sanitizing Large-Scale Datasets
    Li, Bo
    Vorobeychik, Yevgeniy
    Li, Muqun
    Malin, Bradley
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (03) : 698 - 711
  • [29] TIPP: Parallel Delaunay Triangulation for Large-Scale Datasets
    Nguyen, Cuong
    Rhodes, Philip J.
    30TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT (SSDBM 2018), 2018,
  • [30] Exploring Large-scale Public Medical Image Datasets
    Oakden-Rayner, Luke
    ACADEMIC RADIOLOGY, 2020, 27 (01) : 106 - 112