Group Matrix Factorization for Scalable Topic Modeling

被引：0

作者：

Wang, Quan ^{[1
]}

Cao, Zheng ^{[2
]}

Xu, Jun ^{[3
]}

Li, Hang ^{[3
]}

机构：

[1] Peking Univ, MOE Microsoft Key Lab Stat & Informat Technol, Beijing, Peoples R China

[2] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China

[3] Microsoft Res Asia, Beijing, Peoples R China

来源：

SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL | 2012年

关键词：

Matrix Factorization; Topic Modeling; Large Scale;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Topic modeling can reveal the latent structure of text data and is useful for knowledge discovery, search relevance ranking, document classification, and so on. One of the major challenges in topic modeling is to deal with large datasets and large numbers of topics in real-world applications. In this paper, we investigate techniques for scaling up the non-probabilistic topic modeling approaches such as RLSI and NMF. We propose a general topic modeling method, referred to as Group Matrix Factorization (GMF), to enhance the scalability and efficiency of the non-probabilistic approaches. GMF assumes that the text documents have already been categorized into multiple semantic classes, and there exist class-specific topics for each of the classes as well as shared topics across all classes. Topic modeling is then formalized as a problem of minimizing a general objective function with regularizations and/or constraints on the class-specific topics and shared topics. In this way, the learning of class-specific topics can be conducted in parallel, and thus the scalability and efficiency can be greatly improved. We apply GMF to RLSI and NMF, obtaining Group RLSI (GRLSI) and Group NMF (GNMF) respectively. Experiments on a Wikipedia dataset and a real-world web dataset, each containing about 3 million documents, show that GRLSI and GNMF can greatly improve RLSI and NMF in terms of scalability and efficiency. The topics discovered by GRLSI and GNMF are coherent and have good readability. Further experiments on a search relevance dataset, containing 30,000 labeled queries, show that the use of topics learned by GRLSI and GNMF can significantly improve search relevance.

引用

页码：375 / 384

页数：10

共 50 条

[1] Stability of topic modeling via matrix factorization
Belford, Mark
Mac Namee, Brian
Greene, Derek
EXPERT SYSTEMS WITH APPLICATIONS, 2018, 91 : 159 - 169
[2] Coupled matrix factorization and topic modeling for aspect mining
Xiao, Ding
Ji, Yugang
Li, Yitong
Zhuang, Fuzhen
Shi, Chuan
INFORMATION PROCESSING & MANAGEMENT, 2018, 54 (06) : 861 - 873
[3] Neural nonnegative matrix factorization for hierarchical multilayer topic modeling
Haddock, Jamie
Will, Tyler
Vendrow, Joshua
Zhang, Runyu
Molitor, Denali
Needell, Deanna
Gao, Mengdi
Sadovnik, Eli
SAMPLING THEORY SIGNAL PROCESSING AND DATA ANALYSIS, 2024, 22 (01):
[4] NEURAL NONNEGATIVE MATRIX FACTORIZATION FOR HIERARCHICAL MULTILAYER TOPIC MODELING
Gao, M.
Haddock, J.
Molitor, D.
Needell, D.
Sadovnik, E.
Will, T.
Zhang, R.
2019 IEEE 8TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL ADVANCES IN MULTI-SENSOR ADAPTIVE PROCESSING (CAMSAP 2019), 2019, : 6 - 10
[5] Topic Modeling on Triage Notes With Semiorthogonal Nonnegative Matrix Factorization
Li, Yutong
Zhu, Ruoqing
Qu, Annie
Ye, Han
Sun, Zhankun
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2021, 116 (536) : 1609 - 1624
[6] Tutorial on Probabilistic Topic Modeling: Additive Regularization for Stochastic Matrix Factorization
Vorontsov, Konstantin
Potapenko, Anna
ANALYSIS OF IMAGES, SOCIAL NETWORKS AND TEXTS, 2014, 436 : 29 - 46
[7] Lifelong Hierarchical Topic Modeling via Non-negative Matrix Factorization
Lin, Zhicheng
Yan, Jiaxing
Lei, Zhiqi
Rao, Yanghui
WEB AND BIG DATA, PT IV, APWEB-WAIM 2023, 2024, 14334 : 155 - 170
[8] Affinity Regularized Non-Negative Matrix Factorization for Lifelong Topic Modeling
Chen, Yong
Wu, Junjie
Lin, Jianying
Liu, Rui
Zhang, Hui
Ye, Zhiwen
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (07) : 1249 - 1262
[9] Snapshot ensembles of non-negative matrix factorization for stability of topic modeling
Qiang, Jipeng
Li, Yun
Yuan, Yunhao
Liu, Wei
APPLIED INTELLIGENCE, 2018, 48 (11) : 3963 - 3975
[10] Snapshot ensembles of non-negative matrix factorization for stability of topic modeling
Jipeng Qiang
Yun Li
Yunhao Yuan
Wei Liu
Applied Intelligence, 2018, 48 : 3963 - 3975

← 1 2 3 4 5 →