Feature Extension for Chinese Short Text Classification Based on Topical N-Grams

被引：0

作者：

Sun, Baoshan ^{[1
]}

Zhao, Peng ^{[1
]}

机构：

[1] Tianjin Polytech Univ, Sch Comp Sci & Software Engn, Tianjin, Peoples R China

来源：

2017 16TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS 2017) | 2017年

基金：

中国国家自然科学基金;

关键词：

Topical N-Grams; LDA; Short Texts Classification; Feature Extension; SVM;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Because of the feature sparseness problem, conventional text classification methods hardly achieve a good effect on short texts. This paper presents a novel feature extension method based on the TNG model to solve this problem. This algorithm can infers not only the unigram words distribution but also the phrases distribution on each topic. We can build a feature extension library using TNG algorithm. Base on the original features in short texts, we can compute the topic tendency for each of these texts. According to the topic tendency, the appropriate candidate words and phrases are selected from the feature extension library. And then these candidate words and phrases are put into original short texts. After extending features, we use the LDA and SVM algorithm to classify these expanded short texts and use precision, recall and F1-score to evaluate the effect of classification. The result shows that our method can significantly improve classification performance.

引用

页码：477 / 482

页数：6

共 50 条

[1] N-grams based feature selection and text representation for Chinese text classification
Department of Computer Science and Engineering, Tongji University, Cao'an Road, 4800, Shanghai, 201804, China
不详
不详
Int. J. Comput. Intell. Syst., 2009, 4 (365-374):
[2] N-grams based feature selection and text representation for Chinese Text Classification
Zhihua Wei
Duoqian Miao
Jean Hugues Chauchat
Rui Zhao
Wen Li
International Journal of Computational Intelligence Systems, 2009, 2 (4) : 365 - 374
[3] N-grams based feature selection and text representation for Chinese Text Classification
Wei, Zhihua
Miao, Duoqian
Chauchat, Jean-Hugues
Zhao, Rui
Li, Wen
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2009, 2 (04) : 365 - 374
[4] Feature selection on Chinese text classification using character n-grams
Wei, Zhihua
Miao, Duoqian
Chauchat, Jean-Hugues
Zhong, Caiming
ROUGH SETS AND KNOWLEDGE TECHNOLOGY, 2008, 5009 : 500 - +
[5] Hierarchical classification of Chinese documents based on N-grams
Guan, JH
Zhou, SG
DIGITAL LIBRARIES: TECHNOLOGY AND MANAGEMENT OF INDIGENOUS KNOWLEDGE FOR GLOBAL ACCESS, 2003, 2911 : 643 - 652
[6] A Pseudo-document-based Topical N-grams model for short texts
Lin, Hao
Zuo, Yuan
Liu, Guannan
Li, Hong
Wu, Junjie
Wu, Zhiang
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2020, 23 (06): : 3001 - 3023
[7] A Pseudo-document-based Topical N-grams model for short texts
Hao Lin
Yuan Zuo
Guannan Liu
Hong Li
Junjie Wu
Zhiang Wu
World Wide Web, 2020, 23 : 3001 - 3023
[8] Sentence Classification Using N-Grams in Urdu Language Text
Awan, Malik Daler Ali
Ali, Sikandar
Samad, Ali
Iqbal, Nadeem
Missen, Malik Muhammad Saad
Ullah, Niamat
SCIENTIFIC PROGRAMMING, 2021, 2021
[9] Using Word N-Grams as Features in Arabic Text Classification
Al-Thubaity, Abdulmohsen
Alhoshan, Muneera
Hazzaa, Itisam
SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING, 2015, 569 : 35 - 43
[10] Text classification and multilinguism: Getting at words via N-grams of characters
Biskri, I
Delisle, S
6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL V, PROCEEDINGS: COMPUTER SCI I, 2002, : 110 - 115

← 1 2 3 4 5 →