Active learning support vector machines for optimal sample selection in classification

被引:25
|
作者
Zomer, S
Sänchez, MDN
Brereton, RG
Pavón, JLP
机构
[1] Univ Bristol, Sch Chem, Ctr Chemometr, Bristol BS8 1TS, Avon, England
[2] Univ Salamanca, Fac Ciencias Quim, Dept Quim Analit Nutr & Bromatol, E-37008 Salamanca, Spain
关键词
sample selection; active learning; classification; support vector machines;
D O I
10.1002/cem.872
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Labelling samples is a procedure that may result in significant delays particularly when dealing with larger datasets and/or when labelling implies prolonged analysis. In such cases a strategy that allows the construction of a reliable classifier on the basis of a minimal sized training set by labelling a minor fraction of samples can be of advantage. Support vector machines (SVMs) are ideal for such an approach because the classifier relies on only a small subset of samples, namely the support vectors, while being independent from the remaining ones that typically form the majority of the dataset. This paper describes a procedure where a SVM classifier is constructed with support vectors systematically retrieved from the pool of unlabelled samples. The procedure is termed 'active' because the algorithm interacts with the samples prior to their labelling rather than waiting passively for the input. The learning behaviour on simulated datasets is analysed and a practical application for the detection of hydrocarbons in soils using mass spectrometry is described. Results on simulations show that the active learning SVM performs optimally on datasets where the classes display an intermediate level of separation. On the real case study the classifier correctly assesses the membership of all samples in the original dataset by requiring for labelling around 14% of the data. Its subsequent application on a second dataset of analogous nature also provides perfect classification without further labelling, giving the same outcome as most classical techniques based on the entirely labelled original dataset. Copyright (C) 2004 John Wiley Sons, Ltd.
引用
收藏
页码:294 / 305
页数:12
相关论文
共 50 条
  • [1] Feature selection algorithm in classification learning using support vector machines
    Yu. V. Goncharov
    I. B. Muchnik
    L. V. Shvartser
    [J]. Computational Mathematics and Mathematical Physics, 2008, 48 : 1243 - 1260
  • [2] Feature Selection Algorithm in Classification Learning Using Support Vector Machines
    Goncharov, Yu. V.
    Muchnik, I. B.
    Shvartser, L. V.
    [J]. COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS, 2008, 48 (07) : 1243 - 1260
  • [3] Active Learning with Support Vector Machines in Remotely Sensed Image Classification
    Sun, Zhichao
    Liu, Zhigang
    Liu, Suhong
    Zhang, Yun
    Yang, Bing
    [J]. PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 2886 - 2891
  • [4] Active learning with support vector machines
    Kremer, Jan
    Pedersen, Kim Steenstrup
    Igel, Christian
    [J]. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 4 (04) : 313 - 326
  • [5] Towards optimal descriptor subset selection with support vector machines in classification and regression
    Fröhlich, H
    Wegner, JK
    Zell, A
    [J]. QSAR & COMBINATORIAL SCIENCE, 2004, 23 (05): : 311 - 318
  • [6] Optimal parameter selection in support vector machines
    Schittkowski, K.
    [J]. JOURNAL OF INDUSTRIAL AND MANAGEMENT OPTIMIZATION, 2005, 1 (04) : 465 - 476
  • [7] Optimal feature selection for support vector machines
    Nguyen, Minh Hoai
    de la Torre, Fernando
    [J]. PATTERN RECOGNITION, 2010, 43 (03) : 584 - 591
  • [8] In-sample Model Selection for Support Vector Machines
    Anguita, Davide
    Ghio, Alessandro
    Oneto, Luca
    Ridella, Sandro
    [J]. 2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2011, : 1154 - 1161
  • [9] Circuit Performance Classification With Active Learning Guided Sampling for Support Vector Machines
    Lin, Honghuang
    Li, Peng
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2015, 34 (09) : 1467 - 1480
  • [10] On multiclass active learning with support vector machines
    Brinker, K
    [J]. ECAI 2004: 16TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 110 : 969 - 970