Data preprocessing in semi-supervised SVM classification

被引:16
|
作者
Astorino, A. [2 ]
Gorgone, E. [1 ]
Gaudioso, M. [1 ]
Pallaschke, D. [3 ]
机构
[1] Univ Calabria, Dipartimento Elettron Informat & Sistemist, I-87036 Arcavacata Di Rende, CS, Italy
[2] CNR, Ist Calcolo & Reti Ad Alte Prestaz, I-87036 Arcavacata Di Rende, CS, Italy
[3] Univ Karlsruhe, Inst Operat Res, D-76128 Karlsruhe, Germany
关键词
data classification; semi-supervised learning; SVM; nonsmooth optimization; OPTIMIZATION TECHNIQUES;
D O I
10.1080/02331931003692557
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
The literature in the area of the semi-supervised binary classification has demonstrated that useful information can be gathered not only from those samples whose class membership is known in advance, but also from the unlabelled ones. In fact, in the support vector machine, semi-supervised models with both labelled and unlabelled samples contribute to the definition of an appropriate optimization model for finding a good quality separating hyperplane. In particular, the optimization approaches which have been devised in this context are basically of two types: a mixed integer linear programming problem, and a continuous optimization problem characterized by an objective function which is nonsmooth and nonconvex. Both such problems are hard to solve whenever the number of the unlabelled points increases. In this article, we present a data preprocessing technique which has the objective of reducing the number of unlabelled points to enter the computational model, without worsening too much the classification performance of the overall process. The approach is based on the concept of separating sets and can be implemented with a reasonable computational effort. The results of the numerical experiments on several benchmark datasets are also reported.
引用
收藏
页码:143 / 151
页数:9
相关论文
共 50 条
  • [21] Semi-supervised learning for classification of protein sequence data
    King, Brian R.
    Guda, Chittibabu
    SCIENTIFIC PROGRAMMING, 2008, 16 (01) : 5 - 29
  • [22] Semi-supervised classification with spectral subspace projection of data
    Du, Weiwei
    Urahama, Kiichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (01) : 374 - 377
  • [23] Semi-supervised local feature selection for data classification
    Zechao Li
    Jinhui Tang
    Science China Information Sciences, 2021, 64
  • [24] NodeAug: Semi-Supervised Node Classification with Data Augmentation
    Wang, Yiwei
    Wang, Wei
    Liang, Yuxuan
    Cai, Yujun
    Liu, Juncheng
    Hooi, Bryan
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 207 - 217
  • [25] Semi-supervised multiview embedding for hyperspectral data classification
    Volpi, Michele
    Matasci, Giona
    Kanevski, Mikhail
    Tuia, Devis
    NEUROCOMPUTING, 2014, 145 : 427 - 437
  • [26] Semi-supervised local feature selection for data classification
    Li, Zechao
    Tang, Jinhui
    SCIENCE CHINA-INFORMATION SCIENCES, 2021, 64 (09)
  • [27] COMBINED UNSUPERVISED AND SEMI-SUPERVISED LEARNING FOR DATA CLASSIFICATION
    Breve, Fabricio Aparecido
    Guimaraes Pedronette, Daniel Carlos
    2016 IEEE 26TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2016,
  • [28] Semi-supervised local feature selection for data classification
    Zechao LI
    Jinhui TANG
    Science China(Information Sciences), 2021, 64 (09) : 127 - 138
  • [29] Semi-Supervised Audio Classification with Partially Labeled Data
    Gururani, Siddharth
    Lerch, Alexander
    23RD IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2021), 2021, : 111 - 114
  • [30] Semi-supervised classification trees
    Levatic, Jurica
    Ceci, Michelangelo
    Kocev, Dragi
    Dzeroski, Saso
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2017, 49 (03) : 461 - 486