MR-LDA: An Efficient Topic Model for Classification of Short Text in Big Social Data

被引:7
|
作者
Pang, Xiongwen [1 ]
Wan, Benshuai [2 ]
Li, Huifang [1 ]
Lin, Weiwei [3 ]
机构
[1] South China Normal Univ, Sch Comp, Guangzhou, Guangdong, Peoples R China
[2] Guangdong Nanhai Rural Commercial Bank Co Ltd, Dept Informat Technol, Guangzhou, Guangdong, Peoples R China
[3] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Big Data; Latent Dirichlet Allocation; Micro-Blog; Social Network; Topic Mining;
D O I
10.4018/IJGHPC.2016100106
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Latent Dirichlet Allocation(LDA) is an efficient method of text mining, but applying LDA directly to Chinese micro-blog texts will not work well because micro-blogs are more social, brief, and closely related with each other. Based on LDA, this paper proposes a Micro-blog Relation LDA model (MR-LDA), which takes the relations between Chinese micro-blog documents and other Chinese microblog documents into consideration to help topic mining in micro-blog. The authors extend LDA in the following two points. First, they aggregate several Chinese micro-blogs as a single micro-blog document to solve the problem of short texts. Second, they model the generation process of Chinese micro-blogs more accurately by taking relationship between micro-blog documents into consideration. MR-LDA is more suitable to model Chinese micro-blog data. Gibbs sampling method is borrowed to inference the model. Experimental results on actual datasets show that MR-LDA model can offer an effective solution to text mining for Chinese micro-blog.
引用
收藏
页码:100 / 113
页数:14
相关论文
共 50 条
  • [31] Short Text Classification Based on Hierarchical Heterogeneous Graph and LDA Fusion
    Xu, Xinlan
    Li, Bo
    Shen, Yuhao
    Luo, Bing
    Zhang, Chao
    Hao, Fei
    ELECTRONICS, 2023, 12 (12)
  • [32] LF-LDA: A Topic Model for Multi-label Classification
    Zhang, Yongjun
    Ma, Jialin
    Wang, Zijian
    Chen, Bolun
    ADVANCES IN INTERNETWORKING, DATA & WEB TECHNOLOGIES, EIDWT-2017, 2018, 6 : 618 - 628
  • [33] Pseudo-document simulation for comparing LDA, GSDMM and GPM topic models on short and sparse text using Twitter data
    Christoph Weisser
    Christoph Gerloff
    Anton Thielmann
    Andre Python
    Arik Reuter
    Thomas Kneib
    Benjamin Säfken
    Computational Statistics, 2023, 38 : 647 - 674
  • [34] Pseudo-document simulation for comparing LDA, GSDMM and GPM topic models on short and sparse text using Twitter data
    Weisser, Christoph
    Gerloff, Christoph
    Thielmann, Anton
    Python, Andre
    Reuter, Arik
    Kneib, Thomas
    Saefken, Benjamin
    COMPUTATIONAL STATISTICS, 2023, 38 (02) : 647 - 674
  • [35] Classification of Text Documents Based on a Probabilistic Topic Model
    Karpovich, S. N.
    Smirnov, A. V.
    Teslya, N. N.
    SCIENTIFIC AND TECHNICAL INFORMATION PROCESSING, 2019, 46 (05) : 314 - 320
  • [36] Classification of Text Documents Based on a Probabilistic Topic Model
    S. N. Karpovich
    A. V. Smirnov
    N. N. Teslya
    Scientific and Technical Information Processing, 2019, 46 : 314 - 320
  • [37] Biterm Pseudo Document Topic Model for Short Text
    Jiang, Lan
    Lu, Hengyang
    Xu, Ming
    Wang, Chongjun
    2016 IEEE 28TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2016), 2016, : 865 - 872
  • [38] Short text optimized topic model for service clustering
    Lu J.-W.
    Zheng J.-H.
    Li D.-N.
    Xu J.
    Xiao G.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2022, 56 (12): : 2416 - 2425+2444
  • [39] Semantic Augmented Topic Model over Short Text
    Li, Lingyun
    Sun, Yawei
    Wang, Cong
    PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 652 - 656
  • [40] Hot Topic Discovery across Social Networks Based on LDA Model
    Liu, Chang
    Hue, RuiLin
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2021, 15 (11): : 3935 - 3949