Feature-Weighted Sampling for Proper Evaluation of Classification Models

被引：1

作者：

Shin, Hyunseok ^{[1
]}

Oh, Sejong ^{[2
]}

机构：

[1] Dankook Univ 152, Dept Comp Sci, Yongin 16890, South Korea

[2] Dankook Univ, Dept Software Sci, Yongin 16890, South Korea

来源：

APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 05期

关键词：

classification; training and test sets; sampling; feature importance; evaluation;

D O I：

10.3390/app11052039

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

In machine learning applications, classification schemes have been widely used for prediction tasks. Typically, to develop a prediction model, the given dataset is divided into training and test sets; the training set is used to build the model and the test set is used to evaluate the model. Furthermore, random sampling is traditionally used to divide datasets. The problem, however, is that the performance of the model is evaluated differently depending on how we divide the training and test sets. Therefore, in this study, we proposed an improved sampling method for the accurate evaluation of a classification model. We first generated numerous candidate cases of train/test sets using the R-value-based sampling method. We evaluated the similarity of distributions of the candidate cases with the whole dataset, and the case with the smallest distribution-difference was selected as the final train/test set. Histograms and feature importance were used to evaluate the similarity of distributions. The proposed method produces more proper training and test sets than previous sampling methods, including random and non-random sampling.

引用

页码：1 / 18

页数：17

共 50 条

[1] FW-SMOTE: A feature-weighted oversampling approach for imbalanced classification
Maldonado, Sebastian
Vairetti, Carla
Fernandez, Alberto
Herrera, Francisco
[J]. PATTERN RECOGNITION, 2022, 124
[2] Feature-weighted ordinal classification for predicting drug response in multiple myeloma
Ma, Ziyang
Ahn, Jeongyoun
[J]. BIOINFORMATICS, 2021, 37 (19) : 3270 - 3276
[3] Balanced training/test set sampling for proper evaluation of classification models
Kang, Donghoon
Oh, Sejong
[J]. INTELLIGENT DATA ANALYSIS, 2020, 24 (01) : 5 - 18
[4] Linear Feature-weighted Support Vector Machine
Xing, Hong-jie
Ha, Ming-hu
Hu, Bao-gang
Tian, Da-zeng
[J]. FUZZY INFORMATION AND ENGINEERING, 2009, 1 (03) : 289 - 305
[5] A Feature-Weighted SVR Method Based on Kernel Space Feature
Xie, Minghua
Wang, Decheng
Xie, Lili
[J]. ALGORITHMS, 2018, 11 (05)
[6] Short-Term Wind Power Prediction Based on Feature-Weighted and Combined Models
Yin, Deyang
Zhao, Lei
Zhai, Kai
Zheng, Jianfeng
[J]. APPLIED SCIENCES-BASEL, 2024, 14 (17):
[7] Feature-weighted user model for recommender systems
Symeonidis, Panagiotis
Nanopoulos, Alexandros
Manolopoulos, Yannis
[J]. USER MODELING 2007, PROCEEDINGS, 2007, 4511 : 97 - +
[8] Credit scoring by feature-weighted support vector machines
Jian SHI
Shu-you ZHANG
Le-miao QIU
[J]. Frontiers of Information Technology & Electronic Engineering, 2013, 14 (03) : 197 - 204
[9] Credit scoring by feature-weighted support vector machines
Jian SHI
Shuyou ZHANG
Lemiao QIU
[J]. Journal of Zhejiang University-Science C(Computers & Electronics)., 2013, 14 (03) - 204
[10] Feature-Weighted Fuzzy K-Modes Clustering
Nataliani, Yessica
Yang, Miin-Shen
[J]. 2019 3RD INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS, METAHEURISTICS & SWARM INTELLIGENCE (ISMSI 2019), 2019, : 63 - 68

← 1 2 3 4 5 →