An efficient text categorization algorithm based on category memberships

被引:0
|
作者
Deng, ZH [1 ]
Tang, SW [1 ]
Zhang, M [1 ]
机构
[1] Peking Univ, Sch Elect Engn & Comp Sci, Natl Lab Machine Percept, Beijing 100871, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text Categorization is the process of automatically assigning predefined categories to free text documents. Although there have existed a large number of text classification algorithms, most of them are either inefficient or too complex. In this paper, we propose the concept of category memberships, which stand for the degrees that words belonging to categories. Based on category memberships, a simple but efficient algorithm is presented. To evaluate our new algorithm, we have conducted experiments using Newsgroup_18828 text collection to compare it with Naive Bayes and k-NN. Experimental results show that our algorithm outperforms Naive Bayes and k-NN if a suitable category membership function is adopted.
引用
收藏
页码:374 / 382
页数:9
相关论文
共 50 条
  • [1] A KNN BASED ALGORITHM FOR TEXT CATEGORIZATION
    Bucar, Joze
    Povh, Janez
    [J]. SOR'13 PROCEEDINGS: THE 12TH INTERNATIONAL SYMPOSIUM ON OPERATIONAL RESEARCH IN SLOVENIA, 2013, : 367 - 372
  • [2] Using category-based semantic field for text categorization
    Wang, QA
    Guan, Y
    Wang, XL
    Xu, ZM
    [J]. Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 3781 - 3786
  • [3] An improved text categorization algorithm based on VSM
    Geng, Ji
    Lu, Yunling
    Chen, Wei
    Qin, Zhiguang
    [J]. 2014 IEEE 17TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE), 2014, : 1701 - 1706
  • [4] Algorithm of Text Categorization based on Cloud Computing
    Huang, Liqin
    Lin, Liqun
    Liu, Yanhuang
    [J]. INFORMATION, COMMUNICATION AND ENGINEERING, 2013, 311 : 158 - +
  • [5] Intelligence text categorization based on Bayes algorithm
    Yu, F
    An, JY
    Li, H
    Zhu, ML
    Yang, OY
    [J]. ICIA 2004: Proceedings of 2004 International Conference on Information Acquisition, 2004, : 347 - 350
  • [6] An optimal Text categorization algorithm based on SVM
    Wang, Ziqiang
    Sun, Xia
    Zhang, Dexian
    [J]. 2006 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS PROCEEDINGS, VOLS 1-4: VOL 1: SIGNAL PROCESSING, 2006, : 2137 - +
  • [7] FORESTEXTER: An efficient random forest algorithm for imbalanced text categorization
    Wu, Qingyao
    Ye, Yunming
    Zhang, Haijun
    Ng, Michael K.
    Ho, Shen-Shyang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2014, 67 : 105 - 116
  • [8] Preference learning for category-ranking based interactive text categorization
    Aiolli, Fabio
    Sebastiani, Fabrizio
    Sperduti, Alessandro
    [J]. 2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, 2007, : 2034 - +
  • [9] Item Categorization Algorithm Based on Improved Text Representation
    Zhenchao, Tu
    Jing, Ma
    [J]. Data Analysis and Knowledge Discovery, 2022, 6 (05) : 34 - 43
  • [10] A Method of Text Categorization Based on Genetic Algorithm and LDA
    Chen, Lei
    Li, Jun
    Zhang, Li
    [J]. PROCEEDINGS OF THE 36TH CHINESE CONTROL CONFERENCE (CCC 2017), 2017, : 10866 - 10870