Feature Extension for Chinese Short Text Classification Based on Topical N-Grams

被引:0
|
作者
Sun, Baoshan [1 ]
Zhao, Peng [1 ]
机构
[1] Tianjin Polytech Univ, Sch Comp Sci & Software Engn, Tianjin, Peoples R China
基金
中国国家自然科学基金;
关键词
Topical N-Grams; LDA; Short Texts Classification; Feature Extension; SVM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Because of the feature sparseness problem, conventional text classification methods hardly achieve a good effect on short texts. This paper presents a novel feature extension method based on the TNG model to solve this problem. This algorithm can infers not only the unigram words distribution but also the phrases distribution on each topic. We can build a feature extension library using TNG algorithm. Base on the original features in short texts, we can compute the topic tendency for each of these texts. According to the topic tendency, the appropriate candidate words and phrases are selected from the feature extension library. And then these candidate words and phrases are put into original short texts. After extending features, we use the LDA and SVM algorithm to classify these expanded short texts and use precision, recall and F1-score to evaluate the effect of classification. The result shows that our method can significantly improve classification performance.
引用
收藏
页码:477 / 482
页数:6
相关论文
共 50 条
  • [21] Protein classification using modified n-grams and skip-grams
    Islam, S. M. Ashiqul
    Heil, Benjamin J.
    Kearney, Christopher Michel
    Baker, Erich J.
    BIOINFORMATICS, 2018, 34 (09) : 1481 - 1487
  • [22] Multi-Instrument Based N-Grams for Composer Classification Task
    Gelbukh, Alexander
    Alvarez, Daniel Alejandro Perez
    Kolesnikova, Olga
    Chanona-Hernandez, Liliana
    Sidorov, Grigori
    COMPUTACION Y SISTEMAS, 2024, 28 (01): : 85 - 98
  • [23] Evaluation of N-grams conflation approach in text-based information retrieval
    Kosinov, S
    EIGHTH SYMPOSIUM ON STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2001, : 136 - 142
  • [24] Classification of Malware Families Based on N-grams Sequential Pattern Features
    Liangboonprakong, Chatchai
    Sornil, Ohm
    PROCEEDINGS OF THE 2013 IEEE 8TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2013, : 777 - 782
  • [25] Short Text Classification Improved by Feature Space Extension
    Li, Yanxuan
    2019 THE 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, CONTROL AND ROBOTICS (EECR 2019), 2019, 533
  • [26] Contextual Spellchecking Based on N-grams
    Srdic, Ivan
    Gledec, Gordan
    CENTRAL EUROPEAN CONFERENCE ON INFORMATION AND INTELLIGENT SYSTEMS: PROCEEDINGS ARCHIVE 2017, 2017, : 29 - 33
  • [27] A Hierarchical n-Grams Extraction Approach for Classification Problem
    Mhamdi, Faouzi
    Rakotomalala, Ricco
    Elloumi, Mourad
    ADVANCED INTERNET BASED SYSTEMS AND APPLICATIONS, 2009, 4879 : 211 - +
  • [28] Texture Image Classification Using Pixel N-grams
    Kulkarni, Pradnya
    Stranieri, Andrew
    Ugon, Julien
    2016 IEEE INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP), 2016, : 137 - 141
  • [29] N-grams and morphological normalization in text classification: A comparison on a Croatian-English parallel corpus
    Silic, Artur
    Chauchat, Jean-Hugues
    Basic, Bojana Dalbelo
    Morin, Annie
    PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2007, 4874 : 671 - +
  • [30] Classification of Metamorphic Virus Using N-Grams Signatures
    Hamid, Isredza Rahmi A.
    Sani, Nur Sakinah Md
    Abdullah, Zubaile
    Foozy, Cik Feresa Mohd
    Kipli, Kuryati
    RECENT ADVANCES ON SOFT COMPUTING AND DATA MINING (SCDM 2020), 2020, 978 : 140 - 149