Improving text categorization using the importance of words in different categories

被引:0
|
作者
Deng, ZH [1 ]
Zhang, M
机构
[1] Peking Univ, Natl Lab Machine Percept, Beijing 100871, Peoples R China
[2] Peking Univ, Sch Elect Engn & Comp Sci, Beijing 100871, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic text categorization is the task of assigning natural language text documents to predefined categories based on their context. In order to classify text documents, we must evaluate the values of words in documents. In previous research, the value of a word is commonly represented by the product of the term frequency and the inverted document frequency of the word, which is called TF*IDF for short. Since there is a different role for a word in different category documents, we should measure the value of the word according to various categories. In this paper, we proposal a new method used to measure the importance of words in categories and a new framework for text categorization. To verity the efficiency of our new method, we conduct experiments using three text collections. The k-NN is used as the classifier in our experiments. Experimental results show that our new method makes a significant improvement in all these text collections.
引用
收藏
页码:458 / 463
页数:6
相关论文
共 50 条
  • [1] Improving text categorization using the importance of sentences
    Ko, Y
    Park, J
    Seo, J
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2004, 40 (01) : 65 - 79
  • [2] Improving text categorization using domain knowledge
    Zhu, JB
    Chen, WL
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS, 2005, 3513 : 103 - 113
  • [3] Rule-based text categorization using hierarchical categories
    Sasaki, M
    Kita, K
    [J]. 1998 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5, 1998, : 2827 - 2830
  • [4] Using corpus statistics to remove redundant words in text categorization
    Yang, YM
    Wilbur, J
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1996, 47 (05): : 357 - 369
  • [5] Modeling with words: an approach to text categorization
    Shanahan, J
    [J]. 10TH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3: MEETING THE GRAND CHALLENGE: MACHINES THAT SERVE PEOPLE, 2001, : 63 - 66
  • [6] Improving the Performance of Text Categorization using Automatic Summarization
    Jiang Xiao-Yu
    Fan Xiao-Zhong
    Wang Zhi-Fei
    Jia Ke-Liang
    [J]. 2009 INTERNATIONAL CONFERENCE ON COMPUTER MODELING AND SIMULATION, PROCEEDINGS, 2009, : 347 - +
  • [7] Improving Arabic Text Categorization using Decision Trees
    Harrag, Fouzi
    El-Qawasmeh, Eyas
    Pichappan, Pit
    [J]. NDT: 2009 FIRST INTERNATIONAL CONFERENCE ON NETWORKED DIGITAL TECHNOLOGIES, 2009, : 110 - +
  • [8] Computing with words for text processing: An approach to the text categorization
    Zadrozny, S
    Kacprzyk, J
    [J]. INFORMATION SCIENCES, 2006, 176 (04) : 415 - 437
  • [9] Text Categorization Using Fuzzy Proximal SVM and Distributional Clustering of Words
    Kumar, Mani Arun
    Gopal, Madan
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2009, 5476 : 52 - 61
  • [10] Words as rules:: Feature selection in text categorization
    Montañés, E
    Combarro, EF
    Díaz, I
    Ranilla, J
    Quevedo, JR
    [J]. COMPUTATIONAL SCIENCE - ICCS 2004, PT 1, PROCEEDINGS, 2004, 3036 : 666 - 669