Feature Extension for Chinese Short Text Classification Based on Topical N-Grams

被引：0

作者：

Sun, Baoshan ^{[1
]}

Zhao, Peng ^{[1
]}

机构：

[1] Tianjin Polytech Univ, Sch Comp Sci & Software Engn, Tianjin, Peoples R China

来源：

2017 16TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS 2017) | 2017年

基金：

中国国家自然科学基金;

关键词：

Topical N-Grams; LDA; Short Texts Classification; Feature Extension; SVM;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Because of the feature sparseness problem, conventional text classification methods hardly achieve a good effect on short texts. This paper presents a novel feature extension method based on the TNG model to solve this problem. This algorithm can infers not only the unigram words distribution but also the phrases distribution on each topic. We can build a feature extension library using TNG algorithm. Base on the original features in short texts, we can compute the topic tendency for each of these texts. According to the topic tendency, the appropriate candidate words and phrases are selected from the feature extension library. And then these candidate words and phrases are put into original short texts. After extending features, we use the LDA and SVM algorithm to classify these expanded short texts and use precision, recall and F1-score to evaluate the effect of classification. The result shows that our method can significantly improve classification performance.

引用

页码：477 / 482

页数：6

共 50 条

[31] Better text compression from fewer lexical n-grams
Smith, TC
Lorenz, M
DCC 2001: DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2001, : 516 - 516
[32] Towards an automatic classification of images: Approach by the n-grams
Laouamer, Lamri
Biskri, Ismail
Houmadi, Benamar
WMSCI 2005: 9th World Multi-Conference on Systemics, Cybernetics and Informatics, Vol 3, 2005, : 73 - 78
[33] Composer classification using melodic combinatorial n-grams
Alvarez, Daniel Alejandro Perez
Gelbukh, Alexander
Sidorov, Grigori
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
[34] CONTINUOUS MODELS OF AFFECT FROM TEXT USING N-GRAMS
Malandrakis, Nikolaos
Potamianos, Alexandros
Narayanan, Shrikanth
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8500 - 8504
[35] Probabilistic retrieval of OCR degraded text using N-grams
Harding, SM
Croft, WB
Weir, C
RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, 1997, 1324 : 345 - 359
[36] UNORDERED N-GRAMS: NEW APPROACH IN TEXT PLAGIARISM DETECTION
Pribil, Jiri
Leseticky, Ondrej
Kubalova, Kamila
INFORMATION TECHNOLOGIES' 2009, 2009, : 243 - 249
[37] Error Classification Using Automatic Measures Based on n-grams and Edit Distance
Benko, L'ubomir
Benkova, Lucia
Munkova, Dasa
Munk, Michal
Shulzenko, Danylo
ADVANCED RESEARCH IN TECHNOLOGIES, INFORMATION, INNOVATION AND SUSTAINABILITY, ARTIIS 2022, PT I, 2022, 1675 : 345 - 356
[38] A CNN based approach to Phrase-Labelling through classification of N-Grams
Choudhary, Chinmay
O'Riordan, Colm
PROCEEDINGS OF THE 11TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION (FIRE 2019), 2019, : 18 - 23
[39] Topical n-grams: Phrase and topic discovery, with an application to information retrieval
Wang, Xuerui
McCallum, Andrew
Wei, Xing
ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 697 - 702
[40] Automatic statistical translation based on n-grams
Oliver, Antonio
Badia, Toni
Boleda, Gemma
Melero, Maite
PROCESAMIENTO DEL LENGUAJE NATURAL, 2005, (35): : 77 - 84

← 1 2 3 4 5 →