Text Categorization using Weighted Hyper Rectangular Keyword Extraction

被引:1
|
作者
Hassaine, Abdelaali [1 ]
Safi, Zeineb [1 ]
Otaibi, Jameela [1 ]
Jaoua, Ali [1 ]
机构
[1] Qatar Univ, Coll Engn, Comp Sci & Engn Dept, Doha, Qatar
关键词
CLASSIFICATION;
D O I
10.1109/AICCSA.2017.102
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text categorization is an important research field that finds many applications nowadays. It is usually performed in two steps: feature extraction and classification. In the feature extraction step, discriminating keywords are extracted in order to distinguish between different categories of documents. In the classification step, the extracted keywords are fed to a classifier in order to detect the category of each document. In this paper, we use the hyper rectangle method which represents the corpus of documents using a binary relation in which the documents correspond to objects and words to attributes. The hyper rectangle method extracts a tree of keywords such that most discriminative keywords are at the top levels and less discriminative keywords are in the deep levels. We are particularly interested to study different proposed weighting metrics that yield different orderings of keywords. We study how these weighting metrics impact the categorization performance. For the classification step we used both a logistic regression and random forests classifiers. We tested our method on both the 20 newsgroups dataset as well as the Reuters R8 dataset. Our method achieves high performance on both datasets which compete very well with state-of-the-art methods.
引用
收藏
页码:959 / 965
页数:7
相关论文
共 50 条
  • [1] Text Categorization Using Hyper Rectangular Keyword Extraction: Application to News Articles Classification
    Hassaine, Abdelaali
    Mecheter, Souad
    Jaoua, Ali
    RELATIONAL AND ALGEBRAIC METHODS IN COMPUTER SCIENCE (RAMICS 2015), 2015, 9348 : 312 - 325
  • [2] Keyword extraction for text categorization
    An, JY
    Chen, YPP
    PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON ACTIVE MEDIA TECHNOLOGY (AMT 2005), 2005, : 556 - 561
  • [3] Keyword extraction strategy for item banks text categorization
    Nuntiyagul, Atorn
    Naruedomkul, Kanlaya
    Cercone, Nick
    Wongsawang, Damras
    COMPUTATIONAL INTELLIGENCE, 2007, 23 (01) : 28 - 44
  • [4] Keyword Combination Extraction in Text Categorization Based on Ant Colony Optimization
    Yu, Zi-jun
    Wu, Wei-gang
    Xiao, Jing
    Zhang, Jun
    Huang, Rui-Zhang
    Liu, Ou
    2009 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION, 2009, : 430 - +
  • [5] Text Categorization Using SVM with Exponent Weighted ACO
    La Lei
    Guo Qiao
    PROCEEDINGS OF THE 31ST CHINESE CONTROL CONFERENCE, 2012, : 3763 - 3768
  • [6] Text Categorization by Weighted Features
    Fu, Junfeng
    Liang, Liang
    Zheng, Jinkun
    Zhou, Xin
    2018 5TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE 2018), 2018, : 544 - 547
  • [7] Chinese Keyword Extraction Using Semantically Weighted Network
    Chen, Qian
    Jiang, Zengru
    Bian, Jinqiang
    2014 SIXTH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC), VOL 2, 2014, : 83 - 86
  • [8] Text categorization using distributional clustering and concept extraction
    He, Yifan
    Jiang, Minghu
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS: WITH ASPECTS OF THEORETICAL AND METHODOLOGICAL ISSUES, 2007, 4681 : 720 - +
  • [9] LDA-based Keyword Selection in Text Categorization
    Tasci, Serafettin
    Gungor, Tunga
    2009 24TH INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2009, : 229 - 234
  • [10] Context and Keyword Extraction in Plain Text using a Graph Representation
    Chahine, C. Abi
    Chaignaud, N.
    Kotowicz, J. P. H.
    Pecuchet, J. P.
    SITIS 2008: 4TH INTERNATIONAL CONFERENCE ON SIGNAL IMAGE TECHNOLOGY AND INTERNET BASED SYSTEMS, PROCEEDINGS, 2008, : 692 - 696