Accelerating the SVM learning for very large data sets

被引:0
|
作者
Sung, Eric [1 ]
Yan, Zhu [1 ]
Li Xuchun [1 ]
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose an original sequential learning algorithm, SBA, that enables the SVM to efficiently learn from only a small subset of the input data set. The principle is based on sequentially adding convex hull points of the binary classes to a small subset. The SVM is trained on the current training pool and its result. is used to find the data which is wrongly classsified and furthest away from the current optimal hyperplane. This point is added to the training pool and the SVM is retrained on it. The iteration stops when no more suchpoints are found A formal proof of strict convergence is provided and we derive a geometric bound on the training time. It will be explained how SBA can be extended to handle non-linearly and non-separable class distributions. Experimental trials on some well known data sets verify the speed advantage of our method coupled to any SVM over that of that SVM used and the core vector machine.
引用
收藏
页码:484 / +
页数:2
相关论文
共 50 条
  • [11] Joining very large data sets
    Johnson, T
    Chatziantoniou, D
    DATABASES IN TELECOMMUNICATIONS, 2000, 1819 : 118 - 132
  • [12] Distributed Multi Class SVM for Large Data Sets
    Govada, Aruna
    Gauri, Bhavul
    Sahay, S. K.
    PROCEEDING OF THE THIRD INTERNATIONAL SYMPOSIUM ON WOMEN IN COMPUTING AND INFORMATICS (WCI-2015), 2015, : 54 - 58
  • [13] PCA and PLS with very large data sets
    Kettaneh, N
    Berglund, A
    Wold, S
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2005, 48 (01) : 69 - 85
  • [14] Clustering Very Large Dissimilarity Data Sets
    Hammer, Barbara
    Hasenfuss, Alexander
    ARTIFICIAL NEURAL NETWORKS IN PATTERN RECOGNITION, PROCEEDINGS, 2010, 5998 : 259 - +
  • [15] Managing very large distributed data sets on a data grid
    Branco, Miguel
    Zaluska, Ed
    de Roure, David
    Lassnig, Mario
    Garonne, Vincent
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2010, 22 (11): : 1338 - 1364
  • [16] A clustering method for very large mixed data sets
    Sánchez-Díaz, G
    Ruiz-Shulcloper, J
    2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, : 643 - 644
  • [17] Phase Unwrapping for Very Large Interferometric Data Sets
    Zhang, Kui
    Ge, Linlin
    Hu, Zhe
    Alex Hay-Man Ng
    Li, Xiaojing
    Rizos, Chris
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2011, 49 (10): : 4048 - 4061
  • [18] A Bayesian spatiotemporal model for very large data sets
    Harrison, L. M.
    Green, G. G. R.
    NEUROIMAGE, 2010, 50 (03) : 1126 - 1141
  • [19] A genetic algorithm for clustering on very large data sets
    Gasvoda, J
    Ding, Q
    COMPUTER APPLICATIONS IN INDUSTRY AND ENGINEERING, 2003, : 163 - 167
  • [20] On the interactive visualization of very large image data sets
    Ekpar, Frank
    Yoneda, Masaaki
    Hase, Hiroyuki
    2007 CIT: 7TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY, PROCEEDINGS, 2007, : 627 - 632