Feature-Weighted Sampling for Proper Evaluation of Classification Models

被引:1
|
作者
Shin, Hyunseok [1 ]
Oh, Sejong [2 ]
机构
[1] Dankook Univ 152, Dept Comp Sci, Yongin 16890, South Korea
[2] Dankook Univ, Dept Software Sci, Yongin 16890, South Korea
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 05期
关键词
classification; training and test sets; sampling; feature importance; evaluation;
D O I
10.3390/app11052039
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In machine learning applications, classification schemes have been widely used for prediction tasks. Typically, to develop a prediction model, the given dataset is divided into training and test sets; the training set is used to build the model and the test set is used to evaluate the model. Furthermore, random sampling is traditionally used to divide datasets. The problem, however, is that the performance of the model is evaluated differently depending on how we divide the training and test sets. Therefore, in this study, we proposed an improved sampling method for the accurate evaluation of a classification model. We first generated numerous candidate cases of train/test sets using the R-value-based sampling method. We evaluated the similarity of distributions of the candidate cases with the whole dataset, and the case with the smallest distribution-difference was selected as the final train/test set. Histograms and feature importance were used to evaluate the similarity of distributions. The proposed method produces more proper training and test sets than previous sampling methods, including random and non-random sampling.
引用
收藏
页码:1 / 18
页数:17
相关论文
共 50 条
  • [1] FW-SMOTE: A feature-weighted oversampling approach for imbalanced classification
    Maldonado, Sebastian
    Vairetti, Carla
    Fernandez, Alberto
    Herrera, Francisco
    [J]. PATTERN RECOGNITION, 2022, 124
  • [2] Feature-weighted ordinal classification for predicting drug response in multiple myeloma
    Ma, Ziyang
    Ahn, Jeongyoun
    [J]. BIOINFORMATICS, 2021, 37 (19) : 3270 - 3276
  • [3] Balanced training/test set sampling for proper evaluation of classification models
    Kang, Donghoon
    Oh, Sejong
    [J]. INTELLIGENT DATA ANALYSIS, 2020, 24 (01) : 5 - 18
  • [4] Linear Feature-weighted Support Vector Machine
    Xing, Hong-jie
    Ha, Ming-hu
    Hu, Bao-gang
    Tian, Da-zeng
    [J]. FUZZY INFORMATION AND ENGINEERING, 2009, 1 (03) : 289 - 305
  • [5] A Feature-Weighted SVR Method Based on Kernel Space Feature
    Xie, Minghua
    Wang, Decheng
    Xie, Lili
    [J]. ALGORITHMS, 2018, 11 (05)
  • [6] Short-Term Wind Power Prediction Based on Feature-Weighted and Combined Models
    Yin, Deyang
    Zhao, Lei
    Zhai, Kai
    Zheng, Jianfeng
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (17):
  • [7] Feature-weighted user model for recommender systems
    Symeonidis, Panagiotis
    Nanopoulos, Alexandros
    Manolopoulos, Yannis
    [J]. USER MODELING 2007, PROCEEDINGS, 2007, 4511 : 97 - +
  • [8] Credit scoring by feature-weighted support vector machines
    Jian SHI
    Shu-you ZHANG
    Le-miao QIU
    [J]. Frontiers of Information Technology & Electronic Engineering, 2013, 14 (03) : 197 - 204
  • [9] Credit scoring by feature-weighted support vector machines
    Jian SHI
    Shuyou ZHANG
    Lemiao QIU
    [J]. Journal of Zhejiang University-Science C(Computers & Electronics)., 2013, 14 (03) - 204
  • [10] Feature-Weighted Fuzzy K-Modes Clustering
    Nataliani, Yessica
    Yang, Miin-Shen
    [J]. 2019 3RD INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS, METAHEURISTICS & SWARM INTELLIGENCE (ISMSI 2019), 2019, : 63 - 68