Improving text categorization using the importance of words in different categories

被引:0
|
作者
Deng, ZH [1 ]
Zhang, M
机构
[1] Peking Univ, Natl Lab Machine Percept, Beijing 100871, Peoples R China
[2] Peking Univ, Sch Elect Engn & Comp Sci, Beijing 100871, Peoples R China
来源
COMPUTATIONAL INTELLIGENCE AND SECURITY, PT 1, PROCEEDINGS | 2005年 / 3801卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic text categorization is the task of assigning natural language text documents to predefined categories based on their context. In order to classify text documents, we must evaluate the values of words in documents. In previous research, the value of a word is commonly represented by the product of the term frequency and the inverted document frequency of the word, which is called TF*IDF for short. Since there is a different role for a word in different category documents, we should measure the value of the word according to various categories. In this paper, we proposal a new method used to measure the importance of words in categories and a new framework for text categorization. To verity the efficiency of our new method, we conduct experiments using three text collections. The k-NN is used as the classifier in our experiments. Experimental results show that our new method makes a significant improvement in all these text collections.
引用
收藏
页码:458 / 463
页数:6
相关论文
共 50 条
  • [31] Using WordNet for text categorization
    Elberrichi, Zakaria
    Rahmoun, Abdelattif
    Bentaalah, Mohamed Amine
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2008, 5 (01) : 16 - 24
  • [32] Using SVMs for text categorization
    Dumais, S
    IEEE INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1998, 13 (04): : 21 - 23
  • [33] Text readability, complexity metrics and the importance of words
    Lopez-Anguita, Roco
    Montejo-Raez, Arturo
    Martinez-Santiago, Fernando J.
    Carlos Diaz-Galiano, Manuel
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2018, (61): : 101 - 108
  • [34] Improving OCR text categorization accuracy with electronic abstracts
    Li, Linlin
    Tan, Chew Lim
    SECOND INTERNATIONAL CONFERENCE ON DOCUMENT IMAGE ANALYSIS FOR LIBRARIES, PROCEEDINGS, 2006, : 82 - +
  • [35] A feature weighting scheme for text categorization based on feature importance
    College of Computer Science and Technology, Jilin University, Changchun 130012, China
    不详
    不详
    Jisuanji Yanjiu yu Fazhan, 2009, 10 (1693-1703): : 1693 - 1703
  • [36] Text categorization: An experiment using phrases
    Kongovi, M
    Guzman, JC
    Dasigi, V
    ADVANCES IN INFORMATION REFTRIEVAL, 2002, 2291 : 213 - 228
  • [37] Automatic Text Categorization using NTC
    Jo, Taeho
    NDT: 2009 FIRST INTERNATIONAL CONFERENCE ON NETWORKED DIGITAL TECHNOLOGIES, 2009, : 26 - 31
  • [38] The importance of stop word removal on recall values in text categorization
    Silva, C
    Ribeiro, B
    PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 2003, VOLS 1-4, 2003, : 1661 - 1666
  • [39] Biomedical text categorization using UMLS
    Perea Ortega, Jose Manuel
    Martin Valdivia, Maria Teresa
    Montejo Raez, Arturo
    Diaz Galiano, Manuel Carlos
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2008, (40): : 121 - 127
  • [40] Using KNN Algorithm for Text Categorization
    Wajeed, M. A.
    Adilakshmi, T.
    COMPUTATIONAL INTELLIGENCE AND INFORMATION TECHNOLOGY, 2011, 250 : 796 - +