A probabilistic model derived term weighting scheme for text classification

被引:17
|
作者
Feng, Guozhong [1 ,2 ,3 ]
Li, Shaoting [4 ]
Sun, Tieli [1 ]
Zhang, Bangzuo [1 ]
机构
[1] Northeast Normal Univ, Sch Comp Sci & Informat Technol, Key Lab Intelligent Informat Proc Jilin Univ, Changchun 130117, Jilin, Peoples R China
[2] Northeast Normal Univ, Sch Math & Stat, Key Lab Appl Stat MOE, Changchun 130024, Jilin, Peoples R China
[3] Northeast Normal Univ, Inst Computat Biol, Changchun 130117, Jilin, Peoples R China
[4] Dongbei Univ Finance & Econ, Sch Stat, Dalian 116025, Peoples R China
基金
中国国家自然科学基金;
关键词
Latent feature selection indicator; Matching score function; Naive Bayes; Supervised term weighting; Text classification; CATEGORIZATION; BAYES;
D O I
10.1016/j.patrec.2018.03.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Term weighting is known as a text presentation strategy to assign appropriate value to each term to improve the performance of text classification in the task of transforming the content of textual document into a vector in the term space. Supervised weighting methods using the information on the membership of training documents in predefined classes are naturally expected to provide better results than the unsupervised ones. In this paper, a new weighting scheme is proposed via a matching score function based on a probabilistic model. We introduce a latent variable to indicate whether a term contains text classification information or not, specify conjugate priors and exploit the conjugacy by integrating out the latent indicator and the parameters. Then the non-discriminating terms can be assigned weights close to 0. Experimental results using kNN and SVM classifiers illustrate the effectiveness of the proposed approach on both small and large text data sets. (C) 2018 Published by Elsevier B.V.
引用
收藏
页码:23 / 29
页数:7
相关论文
共 50 条
  • [1] An improved term weighting scheme for text classification
    Tang, Zhong
    Li, Wenqiang
    Li, Yan
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (09):
  • [2] A Term Weighting Scheme Approach for Vietnamese Text Classification
    Vu Thanh Nguyen
    Nguyen Tri Hai
    Nguyen Hoang Nghia
    Tuan Dinh Le
    [J]. FUTURE DATA AND SECURITY ENGINEERING, FDSE 2015, 2015, 9446 : 46 - 53
  • [3] A Novel Term Weighting Scheme for Imbalanced Text Classification
    Tantisripreecha, Tanapon
    Soonthornphisaj, Nuanwan
    [J]. Informatica (Slovenia), 2022, 46 (02): : 259 - 268
  • [4] A Novel Term Weighting Scheme for Imbalanced Text Classification
    Tantisripreecha, Tanapon
    Soonthornphisaj, Nuanwan
    [J]. INFORMATICA-AN INTERNATIONAL JOURNAL OF COMPUTING AND INFORMATICS, 2022, 46 (02): : 259 - 268
  • [5] An improved supervised term weighting scheme for text representation and classification
    Tang, Zhong
    Li, Wenqiang
    Li, Yan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 189
  • [6] Term weighting scheme for short-text classification: Twitter corpuses
    Alsmadi, Issa
    Hoon, Gan Keng
    [J]. NEURAL COMPUTING & APPLICATIONS, 2019, 31 (08): : 3819 - 3831
  • [7] Modified DFS-based term weighting scheme for text classification
    Chen, Long
    Jiang, Liangxiao
    Li, Chaoqun
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 168
  • [8] A novel term weighting scheme for text classification: TF-MONO
    Dogan, Turgut
    Uysal, Alper Kursat
    [J]. JOURNAL OF INFORMETRICS, 2020, 14 (04)
  • [9] A simple probability based term weighting scheme for automated text classification
    Liu, Ying
    Loh, Han Tong
    [J]. NEW TRENDS IN APPLIED ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2007, 4570 : 33 - +
  • [10] Term weighting scheme for short-text classification: Twitter corpuses
    Issa Alsmadi
    Gan Keng Hoon
    [J]. Neural Computing and Applications, 2019, 31 : 3819 - 3831