SLDA-TC: A Novel Text Categorization Approach Based on Supervised Topic Model

被引：0

作者：

Tang H.-L. ^{[1
,2
,3
]}

Dou Q.-S. ^{[1
,2
,3
]}

Yu L.-P. ^{[1
,2
,3
]}

Song Y.-J. ^{[1
,2
,3
]}

Lu M.-Y. ^{[4
]}

机构：

[1] School of Computer Science and Technology, Shandong Technology and Business University, Yantai, 264005, Shandong

[2] Co-innovation Center of Shandong Colleges and Universities: Future Intelligent Computing, Yantai, 264005, Shandong

[3] Key Laboratory of Intelligent Information Processing in Universities of Shandong(Shandong Technology and Business University), Yantai, 264005, Shandong

[4] Information Science and Technology College, Dalian Maritime University, Dalian, 116026, Liaoning

来源：

Tien Tzu Hsueh Pao/Acta Electronica Sinica | 2019年 / 47卷 / 06期

关键词：

Gibbs sampling; Latent Dirichlet allocation; Text categorization; Topic model;

D O I：

10.3969/j.issn.0372-2112.2019.06.017

中图分类号：

学科分类号：

摘要：

In this paper, SLDA-TC, a novel text categorization model based on supervised topic model is proposed. The new parameter represents the probability distribution of topic-category is introduced. The SLDA-TC-Gibbs sampling algorithm is presented. At each iteration, a word's latent topic sampling only utilizes the other training documents having the same category with the document the word occurred, meanwhile, the theoretical proof is given. In the SLDA-TC model, the number of topics is only slightly larger than the number of categories. The experimental results demonstrate that the SLDA-TC model promotes the accuracy and speed for text classification compared with the LDA-TC and SVM algorithms. © 2019, Chinese Institute of Electronics. All right reserved.

引用

页码：1300 / 1308

页数：8

共 24 条

[1] Salton G., Wong A.K., Yang C.S., Et al., A vector space model for automatic indexing, Communications Ofthe ACM, 18, 11, pp. 613-620, (1975)
[2] Yu C.L., Ming-Yu L., Fan L., Analysis and construction of word weighting function in VSM, Journal of Computer Research & Development, 39, 10, pp. 1205-1210, (2002)
[3] Tang H.L., Lin Z.K., Lu M.Y., An improved co-training text categorization algorithm based on diversity measures, Acta Electronica Sinica, 36, b12, pp. 138-143, (2008)
[4] Zhai Y.-D., Wang K.-P., Zhang D.-N., Et al., An algorithm for semantic similarity of short text based on wordnet, Acta Electronica Sinica, 40, 3, pp. 617-620, (2012)
[5] He Y.F., Jiang M.H., Information bottleneck based feature selection in web text categorization, Journal of Tsinghua University (Sci& Tech), 50, 1, (2010)
[6] Guo M.S., Zhang Y., Liu T., Research advances and prospect of recognizing textual entailment and knowledge acquisition, Chinese Journal of Computers, 40, 4, pp. 889-910, (2017)
[7] Turney P.D., Pantel P., From frequency to meaning: vector space models of semantics, Journal of Artificial Intelligence Research Archive, AI Access Foundation, 37, 1, pp. 141-188, (2010)
[8] Mikolov T., Chen K., Corrado G., Et al., Efficientestimation of word representations in vector space, Computer Science, (2013)
[9] Deerwester S., Dumais S.T., Furnas G.W., Et al., Indexing bylatent semantic analysis, Journal of the American Society for Information Science, 41, 6, pp. 391-407, (1990)
[10] Hofmann T., Probabilistic latent semantic indexing, Proceedings of the 22nd ACM-SIGIR International Conference on Research and Development in Information Retrieval, pp. 50-57, (1999)

← 1 2 3 →