MR-LDA: An Efficient Topic Model for Classification of Short Text in Big Social Data

被引:7
|
作者
Pang, Xiongwen [1 ]
Wan, Benshuai [2 ]
Li, Huifang [1 ]
Lin, Weiwei [3 ]
机构
[1] South China Normal Univ, Sch Comp, Guangzhou, Guangdong, Peoples R China
[2] Guangdong Nanhai Rural Commercial Bank Co Ltd, Dept Informat Technol, Guangzhou, Guangdong, Peoples R China
[3] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Big Data; Latent Dirichlet Allocation; Micro-Blog; Social Network; Topic Mining;
D O I
10.4018/IJGHPC.2016100106
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Latent Dirichlet Allocation(LDA) is an efficient method of text mining, but applying LDA directly to Chinese micro-blog texts will not work well because micro-blogs are more social, brief, and closely related with each other. Based on LDA, this paper proposes a Micro-blog Relation LDA model (MR-LDA), which takes the relations between Chinese micro-blog documents and other Chinese microblog documents into consideration to help topic mining in micro-blog. The authors extend LDA in the following two points. First, they aggregate several Chinese micro-blogs as a single micro-blog document to solve the problem of short texts. Second, they model the generation process of Chinese micro-blogs more accurately by taking relationship between micro-blog documents into consideration. MR-LDA is more suitable to model Chinese micro-blog data. Gibbs sampling method is borrowed to inference the model. Experimental results on actual datasets show that MR-LDA model can offer an effective solution to text mining for Chinese micro-blog.
引用
收藏
页码:100 / 113
页数:14
相关论文
共 50 条
  • [1] SHORT TEXT CLASSIFICATION BASED ON LDA TOPIC MODEL
    Chen, Qiuxing
    Yao, Lixiu
    Yang, Jie
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), 2016, : 749 - 753
  • [2] Sentiment Classification of Crowdsourcing Participants' Reviews Text Based on LDA Topic Model
    Huang, Yanrong
    Wang, Rui
    Huang, Bin
    Wei, Bo
    Zheng, Shu Li
    Chen, Min
    IEEE ACCESS, 2021, 9 : 108131 - 108143
  • [3] Enhancing Big Social Media Data Quality for Use in Short-Text Topic Modeling
    Murshed, Belal Abdullah Hezam
    Abawajy, Jemal
    Mallappa, Suresha
    Saif, Mufeed Ahmed Naji
    Al-Ghuribi, Sumaia Mohammed
    Ghanem, Fahd A.
    IEEE ACCESS, 2022, 10 : 105328 - 105351
  • [4] Short text classification using semantically enriched topic model
    Uddin, Farid
    Chen, Yibo
    Zhang, Zuping
    Huang, Xin
    JOURNAL OF INFORMATION SCIENCE, 2024,
  • [5] Short text data model of secondary equipment faults in power grids based on LDA topic model and convolutional neural network
    Wei, Wei
    Nan, Dongliang
    Zhang, Lu
    Zhou, Jie
    Wang, Lichao
    Tang, Xiaobing
    2020 35TH YOUTH ACADEMIC ANNUAL CONFERENCE OF CHINESE ASSOCIATION OF AUTOMATION (YAC), 2020, : 156 - 160
  • [6] Social emotion classification of short text via topic-level maximum entropy model
    Rao, Yanghui
    Xie, Haoran
    Li, Jun
    Jin, Fengmei
    Wang, Fu Lee
    Li, Qing
    INFORMATION & MANAGEMENT, 2016, 53 (08) : 978 - 986
  • [7] LDA-PSTR: A Topic Modeling Method for Short Text
    Zhou, Kai
    Yang, Qun
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2018, 2018, 11323 : 339 - 352
  • [8] Multi-LDA hybrid topic model with boosting strategy and its application in text classification
    Wang Yongliang
    Guo Qiao
    2014 33RD CHINESE CONTROL CONFERENCE (CCC), 2014, : 4802 - 4806
  • [9] Improvement of LDA Topic Mining Algorithm and Its Application in Short Text
    Li, Kai
    Li, Chunmei
    2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [10] An Efficient Framework by Topic Model for Multi-label Text Classification
    Sun, Wei
    Ran, Xiangying
    Luo, Xiangyang
    Wang, Chongjun
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,