Memetic feature selection for multilabel text categorization using label frequency difference

被引:50
|
作者
Lee, Jaesung [1 ]
Yu, Injun [1 ]
Park, Jaegyun [1 ]
Kim, Dae-Won [1 ]
机构
[1] Chung Ang Univ, Sch Comp Sci & Engn, 221 Heukseok Dong, Seoul 06974, South Korea
基金
新加坡国家研究基金会;
关键词
Multi-label text categorization; Feature selection; Memetic search; Population-based incremental learning; NAIVE BAYES; CLASSIFICATION; ALGORITHM; SCHEME;
D O I
10.1016/j.ins.2019.02.021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multilabel text categorization is an important task in modern text mining applications. Text datasets comprise an excessive number of terms, and this can degrade the accuracy. Therefore, conventional studies applied a feature selection method before text categorization. Recently, memetic feature selection methods that hybridize an evolutionary feature wrapper and a filter have gained popularity and showed promising results. However, conventional memetic text feature selection methods suffer from limited performance because the used feature filter requires problem transformation that degrades the search capability, resulting in unrefined feature subsets with poor accuracy. In this study, we propose an effective memetic feature selection method based on a novel feature filter that is highly specialized to multilabel text categorization. Our experiments demonstrate that the proposed method significantly outperforms several conventional methods. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:263 / 280
页数:18
相关论文
共 50 条
  • [1] Memetic multilabel feature selection using pruned refinement process
    Seo, Wangduk
    Park, Jaegyun
    Lee, Sanghyuck
    Moon, A-Seong
    Kim, Dae-Won
    Lee, Jaesung
    [J]. JOURNAL OF BIG DATA, 2024, 11 (01)
  • [2] Document transformation for multi-label feature selection in text categorization
    Chen, Weizhu
    Yan, Jun
    Zhang, Benyu
    Chen, Zheng
    Yang, Qiang
    [J]. ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 451 - +
  • [3] An extended document frequency metric for feature selection in text categorization
    Xu, Yan
    Wang, Bin
    Li, JinTao
    Jing, Hongfang
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, 2008, 4993 : 71 - +
  • [4] Using boosting in a multilabel text categorization problem
    Muruzábal, J
    Souto, EG
    [J]. INFORMATION TECHNOLOGY AND ORGANIZATIONS: TRENDS, ISSUES, CHALLENGES AND SOLUTIONS, VOLS 1 AND 2, 2003, : 431 - 433
  • [5] Effective memetic algorithm for multilabel feature selection using hybridization-based communication
    Seo, Wangduk
    Park, Minwoo
    Kim, Dae-Won
    Lee, Jaesung
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 201
  • [6] Using typical testors for feature selection in text categorization
    Pons-Porratal, Aurora
    Gil-Garcia, Reynaldo
    Berlanga-Liavori, Rafael
    [J]. PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS AND APPLICATIONS, PROCEEDINGS, 2007, 4756 : 643 - +
  • [7] Feature selection in SVM text categorization
    Taira, H
    Haruno, M
    [J]. SIXTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-99)/ELEVENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE (IAAI-99), 1999, : 480 - 486
  • [8] Relative term-frequency based feature selection for text categorization
    Yang, SM
    Wu, XB
    Deng, ZH
    Zhang, M
    Yang, DQ
    [J]. 2002 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-4, PROCEEDINGS, 2002, : 1432 - 1436
  • [9] Feature selection strategies for text categorization
    Soucy, P
    Mineau, GW
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2003, 2671 : 505 - 509
  • [10] Max-difference maximization criterion: a feature selection method for text categorization
    Lingbin Jin
    Li Zhang
    Lei Zhao
    [J]. Frontiers of Computer Science, 2023, 17