FISA: Feature-based instance selection for imbalanced text classification

被引:0
|
作者
Sun, Aixin [1 ]
Lim, Ee-Peng
Benatallah, Boualem
Hassan, Mahbub
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore, Singapore
[2] Univ New S Wales, Sch Comp Sci & Engn, Sydney, NSW 2052, Australia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Support Vector Machines (SVM) classifiers are widely used in text classification tasks and these tasks often involve imbalanced training. In this paper, we specifically address the cases where negative training documents significantly outnumber the positive ones. A generic algorithm known as FISA (Feature-based Instance Selection Algorithm), is proposed to select only a subset of negative training documents for training a SVM classifier. With a smaller carefully selected training set, a SVM classifier can be more efficiently trained while delivering comparable or better classification accuracy. In our experiments on the 20-Newsgroups dataset, using only 35% negative training examples and 60% learning time, methods based on FISA delivered much better classification accuracy than those methods using all negative training documents.
引用
收藏
页码:250 / 254
页数:5
相关论文
共 50 条
  • [1] Optimal Feature Selection for Imbalanced Text Classification
    Khurana, Anshu
    Verma, Om Prakash
    [J]. IEEE Transactions on Artificial Intelligence, 2023, 4 (01): : 135 - 147
  • [2] Comparison of metrics for feature selection in imbalanced text classification
    Ogura, Hiroshi
    Amano, Hiromi
    Kondo, Masato
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (05) : 4978 - 4989
  • [3] Feature-Based Diversity Optimization for Problem Instance Classification
    Gao, Wanru
    Nallaperuma, Samadhi
    Neumann, Frank
    [J]. PARALLEL PROBLEM SOLVING FROM NATURE - PPSN XIV, 2016, 9921 : 869 - 879
  • [4] Feature-Based Diversity Optimization for Problem Instance Classification
    Gao, Wanru
    Nallaperuma, Samadhi
    Neumann, Frank
    [J]. EVOLUTIONARY COMPUTATION, 2021, 29 (01) : 107 - 128
  • [5] Feature-Based Subjectivity Classification of Filipino Text
    Regalado, Ralph Vincent J.
    Cheng, Charibeth K.
    [J]. 2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012), 2012, : 57 - 60
  • [6] Cluster-Based Instance Selection for the Imbalanced Data Classification
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    [J]. COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2018, PT II, 2018, 11056 : 191 - 200
  • [7] A Classification Method Based on Feature Selection for Imbalanced Data
    Liu, Yi
    Wang, Yanzhen
    Ren, Xiaoguang
    Zhou, Hao
    Diao, Xingchun
    [J]. IEEE ACCESS, 2019, 7 : 81794 - 81807
  • [8] Imbalanced Data Classification Based on Feature Selection Techniques
    Ksieniewicz, Pawel
    Wozniak, Michal
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING (IDEAL 2018), PT II, 2018, 11315 : 296 - 303
  • [9] Evolutionary instance selection for text classification
    Tsai, Chih-Fong
    Chen, Zong-Yao
    Ke, Shih-Wen
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2014, 90 : 104 - 113
  • [10] Combination of Feature-based and Instance-based methods for Domain Adaptation in Sentiment Classification
    Bai, Jing
    Cao, Rui
    Ma, Wen
    Shinnou, Hiroyuki
    [J]. 2019 INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2019,