Term-weighting learning via genetic programming for text classification

被引:48
|
作者
Jair Escalante, Hugo [1 ]
Garcia-Limon, Mauricio A. [1 ]
Morales-Reyes, Alicia [1 ]
Graff, Mario [2 ]
Montes-y-Gomez, Manuel [1 ]
Morales, Eduardo F. [1 ]
Martinez-Carranza, Jose [1 ]
机构
[1] Inst Nacl Astrofis Opt & Electr, Dept Comp Sci, Puebla 72840, Mexico
[2] INFOTEC Ctr Invest & Innovac Tecnol Informac & Co, Aguascalientes, Mexico
关键词
Term-weighting learning; Genetic programming; Text mining; Representation learning; Bag of words; SCHEMES;
D O I
10.1016/j.knosys.2015.03.025
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a novel approach to learning term-weighting schemes (TWSs) in the context of text classification. In text mining a TWS determines the way in which documents will be represented in a vector space model, before applying a classifier. Whereas acceptable performance has been obtained with standard TWSs (e.g., Boolean and term-frequency schemes), the definition of TWSs has been traditionally an art. Further, it is still a difficult task to determine what is the best TWS for a particular problem and it is not clear yet, whether better schemes, than those currently available, can be generated by combining known TWS. We propose in this article a genetic program that aims at learning effective TWSs that can improve the performance of current schemes in text classification. The genetic program learns how to combine a set of basic units to give rise to discriminative TWSs. We report an extensive experimental study comprising data sets from thematic and non-thematic text classification as well as from image classification. Our study shows the validity of the proposed method; in fact, we show that TWSs learned with the genetic program outperform traditional schemes and other TWSs proposed in recent works. Further, we show that TWSs learned from a specific domain can be effectively used for other tasks. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:176 / 189
页数:14
相关论文
共 50 条
  • [1] Term-weighting learning via genetic programming for text classification
    Escalante, Hugo Jair
    García-Limón, Mauricio A.
    Morales-Reyes, Alicia
    Graff, Mario
    Montes-y-Gómez, Manuel
    Morales, Eduardo F.
    Martínez-Carranza, José
    Knowledge-Based Systems, 2015, 83 : 176 - 189
  • [2] Model-induced term-weighting schemes for text classification
    Kim, Hyun Kyung
    Kim, Minyoung
    APPLIED INTELLIGENCE, 2016, 45 (01) : 30 - 43
  • [3] Model-induced term-weighting schemes for text classification
    Hyun Kyung Kim
    Minyoung Kim
    Applied Intelligence, 2016, 45 : 30 - 43
  • [4] Combining supervised term-weighting metrics for SVM text classification with extended term representation
    Mounia Haddoud
    Aïcha Mokhtari
    Thierry Lecroq
    Saïd Abdeddaïm
    Knowledge and Information Systems, 2016, 49 : 909 - 931
  • [5] Combining supervised term-weighting metrics for SVM text classification with extended term representation
    Haddoud, Mounia
    Mokhtari, Aicha
    Lecroq, Thierry
    Abdeddaim, Said
    KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 49 (03) : 909 - 931
  • [6] TERM-WEIGHTING APPROACHES IN AUTOMATIC TEXT RETRIEVAL
    SALTON, G
    BUCKLEY, C
    INFORMATION PROCESSING & MANAGEMENT, 1988, 24 (05) : 513 - 523
  • [7] A Novel Term-weighting Approach in Text Classification over Skewed Data Sets
    Sun, Tieli
    Zhang, Yujie
    Yang, Fengqin
    Yang, Xiquan
    Jiang, Yingjie
    Wang, Zibing
    Li, Kuiwu
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2010, 13 (03): : 621 - 633
  • [8] Term-Weighting in Information Retrieval using Genetic Programming: A three stage process
    Cummins, Ronan
    O'Riordan, Colm
    ECAI 2006, PROCEEDINGS, 2006, 141 : 793 - 794
  • [9] Hybridized term-weighting method for Dark Web classification
    Sabbah, Thabit
    Selamat, Ali
    Selamat, Md. Hafiz
    Ibrahim, Roliana
    Fujita, Hamido
    NEUROCOMPUTING, 2016, 173 : 1908 - 1926
  • [10] A generic multi-level framework for building term-weighting schemes in text classification
    Tang, Zhong
    COMPUTER JOURNAL, 2024, : 3042 - 3055