FISA: Feature-based instance selection for imbalanced text classification

被引:0
|
作者
Sun, Aixin [1 ]
Lim, Ee-Peng
Benatallah, Boualem
Hassan, Mahbub
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore, Singapore
[2] Univ New S Wales, Sch Comp Sci & Engn, Sydney, NSW 2052, Australia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Support Vector Machines (SVM) classifiers are widely used in text classification tasks and these tasks often involve imbalanced training. In this paper, we specifically address the cases where negative training documents significantly outnumber the positive ones. A generic algorithm known as FISA (Feature-based Instance Selection Algorithm), is proposed to select only a subset of negative training documents for training a SVM classifier. With a smaller carefully selected training set, a SVM classifier can be more efficiently trained while delivering comparable or better classification accuracy. In our experiments on the 20-Newsgroups dataset, using only 35% negative training examples and 60% learning time, methods based on FISA delivered much better classification accuracy than those methods using all negative training documents.
引用
收藏
页码:250 / 254
页数:5
相关论文
共 50 条
  • [21] FEATURE SELECTION AND CLASSIFICATION INTEGRATED METHOD FOR IDENTIFYING CITED TEXT SPANS FOR CITANCES ON IMBALANCED DATA
    Yee, Jen-Yuan
    Tsai, Cheng-Jung
    Hsu, Tien-Yu
    Lin, Jung-Yi
    Cheng, Pei-Cheng
    [J]. MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2021, 34 (04) : 355 - 373
  • [22] Feature-based Assessment of Text Readability
    Zhang, Lixiao
    Liu, Zaiying
    Ni, Jun
    [J]. 2013 SEVENTH INTERNATIONAL CONFERENCE ON INTERNET COMPUTING FOR ENGINEERING AND SCIENCE (ICICSE 2013), 2013, : 51 - 54
  • [23] Hybrid Feature-Based Multi-label Text Classification-A Framework
    Agarwal, Nancy
    Wani, Mudasir Ahmad
    ELAffendi, Mohammed
    [J]. ADVANCES IN CYBERSECURITY, CYBERCRIMES, AND SMART EMERGING TECHNOLOGIES, 2023, 4 : 211 - 221
  • [24] Feature-Based Fusion Adversarial Recurrent Neural Networks for Text Sentiment Classification
    Ma, Yaohong
    Fan, Hong
    Zhao, Cheng
    [J]. IEEE ACCESS, 2019, 7 : 132542 - 132551
  • [25] A PROPOSAL FOR FEATURE CLASSIFICATION IN FEATURE-BASED DESIGN
    OVTCHAROVA, J
    PAHL, G
    RIX, J
    [J]. COMPUTERS & GRAPHICS, 1992, 16 (02) : 187 - 195
  • [26] A Novel Feature-Based Text Classification Improving the Accuracy of Twitter Sentiment Analysis
    Wang, Yili
    Sun, Le
    Wang, Jin
    Zheng, Yuhui
    Youn, Hee Yong
    [J]. ADVANCES IN COMPUTER SCIENCE AND UBIQUITOUS COMPUTING, 2018, 474 : 440 - 445
  • [27] Dynamic feature selection in text classification
    Doan, Son
    Horiguchi, Susumu
    [J]. INTELLIGENT CONTROL AND AUTOMATION, 2006, 344 : 664 - 675
  • [28] Contextual feature selection for text classification
    Paradis, Francois
    Nie, Jian-Yun
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (02) : 344 - 352
  • [29] Feature selection for text classification: A review
    Deng, Xuelian
    Li, Yuqing
    Weng, Jian
    Zhang, Jilian
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (03) : 3797 - 3816
  • [30] Hybrid feature selection for text classification
    Gunal, Serkan
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2012, 20 : 1296 - 1311