Text Categorization using Weighted Hyper Rectangular Keyword Extraction

被引:1
|
作者
Hassaine, Abdelaali [1 ]
Safi, Zeineb [1 ]
Otaibi, Jameela [1 ]
Jaoua, Ali [1 ]
机构
[1] Qatar Univ, Coll Engn, Comp Sci & Engn Dept, Doha, Qatar
关键词
CLASSIFICATION;
D O I
10.1109/AICCSA.2017.102
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text categorization is an important research field that finds many applications nowadays. It is usually performed in two steps: feature extraction and classification. In the feature extraction step, discriminating keywords are extracted in order to distinguish between different categories of documents. In the classification step, the extracted keywords are fed to a classifier in order to detect the category of each document. In this paper, we use the hyper rectangle method which represents the corpus of documents using a binary relation in which the documents correspond to objects and words to attributes. The hyper rectangle method extracts a tree of keywords such that most discriminative keywords are at the top levels and less discriminative keywords are in the deep levels. We are particularly interested to study different proposed weighting metrics that yield different orderings of keywords. We study how these weighting metrics impact the categorization performance. For the classification step we used both a logistic regression and random forests classifiers. We tested our method on both the 20 newsgroups dataset as well as the Reuters R8 dataset. Our method achieves high performance on both datasets which compete very well with state-of-the-art methods.
引用
收藏
页码:959 / 965
页数:7
相关论文
共 50 条
  • [41] The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction
    Najafi, Elham
    Darooneh, Amir H.
    PLOS ONE, 2015, 10 (06):
  • [42] An Unsupervised Keyword Extraction Method based on Text Semantic Graph
    Zhao, Liujun
    Miao, Zhongquan
    Wang, Chunming
    Kong, Weizheng
    2022 IEEE 6TH ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2022, : 1431 - 1436
  • [43] Text mining of accident reports using semi-supervised keyword extraction and topic modeling
    Ahadh, Abdhul
    Binish, Govind Vallabhasseri
    Srinivasan, Rajagopalan
    PROCESS SAFETY AND ENVIRONMENTAL PROTECTION, 2021, 155 : 455 - 465
  • [44] Automatic Text Categorization using NTC
    Jo, Taeho
    NDT: 2009 FIRST INTERNATIONAL CONFERENCE ON NETWORKED DIGITAL TECHNOLOGIES, 2009, : 26 - 31
  • [45] Text categorization: An experiment using phrases
    Kongovi, M
    Guzman, JC
    Dasigi, V
    ADVANCES IN INFORMATION REFTRIEVAL, 2002, 2291 : 213 - 228
  • [46] Biomedical text categorization using UMLS
    Perea Ortega, Jose Manuel
    Martin Valdivia, Maria Teresa
    Montejo Raez, Arturo
    Diaz Galiano, Manuel Carlos
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2008, (40): : 121 - 127
  • [47] Using KNN Algorithm for Text Categorization
    Wajeed, M. A.
    Adilakshmi, T.
    COMPUTATIONAL INTELLIGENCE AND INFORMATION TECHNOLOGY, 2011, 250 : 796 - +
  • [48] On using partial supervision for text categorization
    Aggarwal, CC
    Gates, SC
    Yu, PS
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (02) : 245 - 255
  • [49] Weighted average pointwise mutual information for feature selection in text categorization
    Schneider, KM
    KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2005, 2005, 3721 : 252 - 263
  • [50] Keyword-Based Journal Categorization Using Deep Learning
    Revathi, T.
    Rajalaxmi, T. M.
    SOFT COMPUTING FOR PROBLEM SOLVING, SOCPROS 2017, VOL 1, 2019, 816 : 711 - 718