Text categorization based on dissimilarity representation and prototype selection

被引:3
|
作者
Pinheiro, Roberto H. W. [1 ]
Cavalcanti, George D. C. [1 ]
Ren, Tsang Ing [1 ]
机构
[1] Univ Fed Pernambuco UFPE, Ctr Informat CIn, Av Jornalista Anibal Fernandes S-N, BR-50740560 Recife, PE, Brazil
关键词
D O I
10.1109/BRACIS.2015.28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bag-of-Words is the most used representation in text categorization, however it has some problems because its representation produces sparse high-dimensional feature vectors and have high feature-to-instance ratio. Feature selection is the most common approach to alleviate these problems. However, feature selection does not solve all the problems and information is lost in the process. In this paper, we propose a method based on dissimilarity representation and prototype selection to address these problems. Dissimilarity representation reduces the problems of Bag-of-Words and prototype selection is used to select a smaller representation set, increasing the benefits of using dissimilarity representation. The experimental study evaluated the effectiveness of the proposed method on four text categorization databases (RCV1, Reuters, TDT2, and WebKB) using Support Vector Machines. The proposed method reduces the number of features in 79% on average and presents better, or similar, results in 84% of the cases when compared with the Bag-of-Words approach.
引用
收藏
页码:163 / 168
页数:6
相关论文
共 50 条
  • [1] A Compact Representation of Multiscale Dissimilarity Data by Prototype Selection
    Plasencia-Calana, Yenisel
    Li, Yan
    Duin, Robert P. W.
    Orozco-Alzate, Mauricio
    Loog, Marco
    Garcia-Reyes, Edel
    [J]. PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2016, 2017, 10125 : 150 - 157
  • [2] Prototype selection for dissimilarity-based classifiers
    Pekalska, E
    Duin, RPW
    Paclík, P
    [J]. PATTERN RECOGNITION, 2006, 39 (02) : 189 - 208
  • [3] Combining dissimilarity spaces for text categorization
    Pinheiro, Roberto H. W.
    Cavalcanti, George D. C.
    Tsang, Ing Ren
    [J]. INFORMATION SCIENCES, 2017, 406 : 87 - 101
  • [4] Projected-prototype based classifier for text categorization
    Zhang, Jianfei
    Chen, Lifei
    Guo, Gongde
    [J]. KNOWLEDGE-BASED SYSTEMS, 2013, 49 : 179 - 189
  • [5] Interactions between document representation and feature selection in text categorization
    Radovanovic, Milos
    Ivanovic, Mirjana
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2006, 4080 : 489 - 498
  • [6] A fuzzy-based approach for text representation in text categorization
    Doan, S
    [J]. FUZZ-IEEE 2005: PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS: BIGGEST LITTLE CONFERENCE IN THE WORLD, 2005, : 1008 - 1013
  • [7] Text Categorization Based on Clustering Feature Selection
    Zhou, Xiaofei
    Hu, Yue
    Guo, Li
    [J]. 2ND INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT, ITQM 2014, 2014, 31 : 398 - 405
  • [8] Item Categorization Algorithm Based on Improved Text Representation
    Zhenchao T.
    Jing M.
    [J]. Data Analysis and Knowledge Discovery, 2022, 6 (05) : 34 - 43
  • [9] Cluster Based Symbolic Representation for Skewed Text Categorization
    Raju, Lavanya Narayana
    Suhil, Mahamad
    Guru, D. S.
    Gowda, Harsha S.
    [J]. RECENT TRENDS IN IMAGE PROCESSING AND PATTERN RECOGNITION (RTIP2R 2016), 2017, 709 : 202 - 216
  • [10] LDA-based Keyword Selection in Text Categorization
    Tasci, Serafettin
    Gungor, Tunga
    [J]. 2009 24TH INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2009, : 229 - 234