Improving text categorization using the importance of words in different categories

被引:0
|
作者
Deng, ZH [1 ]
Zhang, M
机构
[1] Peking Univ, Natl Lab Machine Percept, Beijing 100871, Peoples R China
[2] Peking Univ, Sch Elect Engn & Comp Sci, Beijing 100871, Peoples R China
来源
COMPUTATIONAL INTELLIGENCE AND SECURITY, PT 1, PROCEEDINGS | 2005年 / 3801卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic text categorization is the task of assigning natural language text documents to predefined categories based on their context. In order to classify text documents, we must evaluate the values of words in documents. In previous research, the value of a word is commonly represented by the product of the term frequency and the inverted document frequency of the word, which is called TF*IDF for short. Since there is a different role for a word in different category documents, we should measure the value of the word according to various categories. In this paper, we proposal a new method used to measure the importance of words in categories and a new framework for text categorization. To verity the efficiency of our new method, we conduct experiments using three text collections. The k-NN is used as the classifier in our experiments. Experimental results show that our new method makes a significant improvement in all these text collections.
引用
收藏
页码:458 / 463
页数:6
相关论文
共 50 条
  • [21] Improving text categorization by resolving semantic ambiguity
    Uejima, H
    Miura, T
    Shioya, I
    2003 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS, AND SIGNAL PROCESSING, VOLS 1 AND 2, CONFERENCE PROCEEDINGS, 2003, : 796 - 799
  • [22] A Survey on Different Text Categorization Techniques for Text Filtration
    Yadav, Shashank H.
    Pame, Balu L.
    PROCEEDINGS OF 2015 IEEE 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO), 2015,
  • [23] Improving Chinese text categorization by outlier learning
    Wang, XH
    Luo, DS
    Wu, XH
    Chi, HS
    PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (IEEE NLP-KE'05), 2005, : 602 - 607
  • [24] Building a simple and effective text categorization system using relative importance in category
    Yan, Bingheng
    Qian, Depei
    ICNC 2007: THIRD INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 1, PROCEEDINGS, 2007, : 108 - +
  • [25] How words anchor categorization: conceptual flexibility with labeled and unlabeled categories
    Tolins, Jackson
    Colunga, Eliana
    LANGUAGE AND COGNITION, 2015, 7 (02) : 219 - 238
  • [26] Improving bag-of-words scheme for scene categorization
    Li, Qun
    Zhang, Hong-Gang
    Guo, Jun
    Bhanu, Bir
    An, Le
    Li, Q. (liqun@bupt.edu.cn), 1600, Beijing University of Posts and Telecommunications (19): : 166 - 171
  • [27] Document Representation Combining Concepts and Words in Chinese Text Categorization
    Che, Chao
    Teng, HongFei
    IEEE NLP-KE 2009: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2009, : 540 - 544
  • [28] Text Categorization by Learning Predominant Sense of Words as Auxiliary Task
    Shimura, Kazuya
    Li, Jiyi
    Fukumoto, Fumiyo
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1109 - 1119
  • [29] VSM text categorization method based on the position of key words
    Li Wei-dong
    Yang Bing-ru
    Li Long-xing
    Qu Wen-long
    PROCEEDINGS OF 2005 CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1 AND 2, 2005, : 1877 - 1879
  • [30] Distributional word clusters vs. words for text categorization
    Bekkerman, Ron
    El-Yaniv, Ran
    Tishby, Naftali
    Winter, Yoad
    Journal of Machine Learning Research, 2003, 3 : 1183 - 1208