An efficient text categorization algorithm based on category memberships

被引:0
|
作者
Deng, ZH [1 ]
Tang, SW [1 ]
Zhang, M [1 ]
机构
[1] Peking Univ, Sch Elect Engn & Comp Sci, Natl Lab Machine Percept, Beijing 100871, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text Categorization is the process of automatically assigning predefined categories to free text documents. Although there have existed a large number of text classification algorithms, most of them are either inefficient or too complex. In this paper, we propose the concept of category memberships, which stand for the degrees that words belonging to categories. Based on category memberships, a simple but efficient algorithm is presented. To evaluate our new algorithm, we have conducted experiments using Newsgroup_18828 text collection to compare it with Naive Bayes and k-NN. Experimental results show that our algorithm outperforms Naive Bayes and k-NN if a suitable category membership function is adopted.
引用
收藏
页码:374 / 382
页数:9
相关论文
共 50 条
  • [31] A linear text classification algorithm based on category relevance factors
    Deng, ZH
    Tang, SW
    Yang, DQ
    Zhang, M
    Wu, XB
    Yang, M
    [J]. DIGITAL LIBRARIES: PEOPLE, KNOWLEDGE, AND TECHNOLOGY, PROCEEDINGS, 2002, 2555 : 88 - 98
  • [32] The Chinese text categorization system with association rule and category priority
    Chiang, Ding-An
    Keh, Huan-Chao
    Huang, Hui-Hua
    Chyr, Derming
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2008, 35 (1-2) : 102 - 110
  • [33] A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization
    Yang, Jieming
    Liu, Yuanning
    Zhu, Xiaodong
    Liu, Zhen
    Zhang, Xiaoxu
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2012, 48 (04) : 741 - 754
  • [34] An Empirical Study of Category Skew on Feature Selection for Text Categorization
    Simeon, Mondelle
    Hilderman, Robert
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, 5549 : 249 - +
  • [35] Automatic Category Structure Generation and Categorization of Chinese Text Documents
    Yang, Hsin-Chang
    Lee, Chung-Hong
    [J]. LECTURE NOTES IN COMPUTER SCIENCE <D>, 2000, 1910 : 673 - 678
  • [36] Lazy learner text categorization algorithm based on embedded feature selection
    Yan Peng
    Zheng Xuefeng
    Zhu Jianyong
    Xiao Yunhong
    [J]. JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2009, 20 (03) : 651 - 659
  • [37] A New Text Categorization Method Based on SVD and Cascade Correlation Algorithm
    Wang, Yan Xia
    Deng, Wei
    [J]. 2009 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, VOL III, PROCEEDINGS, 2009, : 57 - 60
  • [38] An EM based training algorithm for cross-language text categorization
    Rigutini, L
    Maggini, M
    Liu, B
    [J]. 2005 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, 2005, : 529 - 535
  • [39] Novel feature selection algorithm for Chinese text categorization based on CHI
    Cai Zhenliang
    Wang Jian
    Liu Jiqiang
    [J]. PROCEEDINGS OF 2016 IEEE 13TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP 2016), 2016, : 1035 - 1039
  • [40] Lazy learner text categorization algorithm based on embedded feature selection
    Yan Peng~(1
    2.China State Information Center
    [J]. Journal of Systems Engineering and Electronics, 2009, 20 (03) : 651 - 659