Support Vector Machines on Large Data Sets: Simple Parallel Approaches

被引：13

作者：

Meyer, Oliver ^{[1
]}

Bischl, Bernd ^{[1
]}

Weihs, Claus ^{[1
]}

机构：

[1] TU Dortmund, Dept Stat, Chair Computat Stat, Dortmund, Germany

来源：

DATA ANALYSIS, MACHINE LEARNING AND KNOWLEDGE DISCOVERY | 2014年

关键词：

D O I：

10.1007/978-3-319-01595-8_10

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Support Vector Machines (SVMs) are well-known for their excellent performance in the field of statistical classification. Still, the high computational cost due to the cubic runtime complexity is problematic for larger data sets. To mitigate this, Graf et al. (Adv. Neural Inf. Process. Syst. 17:521-528, 2005) proposed the Cascade SVM. It is a simple, stepwise procedure, in which the SVM is iteratively trained on subsets of the original data set and support vectors of resulting models are combined to create new training sets. The general idea is to bound the size of all considered training sets and therefore obtain a significant speedup. Another relevant advantage is that this approach can easily be parallelized because a number of independent models have to be fitted during each stage of the cascade. Initial experiments show that even moderate parallelization can reduce the computation time considerably, with only minor loss in accuracy. We compare the Cascade SVM to the standard SVM and a simple parallel bagging method w.r.t. both classification accuracy and training time. We also introduce a new stepwise bagging approach that exploits parallelization in a better way than the Cascade SVM and contains an adaptive stopping-time to select the number of stages for improved accuracy.

引用

页码：87 / 95

页数：9

共 50 条

[1] Training Support Vector Machines on Large Sets of Image Data
Kukenys, Ignas
McCane, Brendan
Neumegen, Tim
COMPUTER VISION - ACCV 2009, PT III, 2010, 5996 : 331 - 340
[2] Using the Leader Algorithm with Support Vector Machines for Large Data Sets
Romero, Enrique
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2011, PT I, 2011, 6791 : 225 - 232
[3] Using support vector machines for mining regression classes in large data sets
Sun, ZH
Gao, LX
Sun, YX
2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 89 - 92
[4] Parallel decomposition approaches for training support vector machines
Serafini, T
Zanghirati, G
Zanni, L
PARALLEL COMPUTING: SOFTWARE TECHNOLOGY, ALGORITHMS, ARCHITECTURES AND APPLICATIONS, 2004, 13 : 259 - 266
[5] Boosting support vector machines for imbalanced data sets
Wang, Benjamin X.
Japkowicz, Nathalie
KNOWLEDGE AND INFORMATION SYSTEMS, 2010, 25 (01) : 1 - 20
[6] Boosting support vector machines for imbalanced data sets
Wang, Benjamin X.
Japkowicz, Nathalie
FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2008, 4994 : 38 - 47
[7] Boosting support vector machines for imbalanced data sets
Benjamin X. Wang
Nathalie Japkowicz
Knowledge and Information Systems, 2010, 25 : 1 - 20
[8] One-class support vector machines for large-scale data sets
Wang, H. (hgwang@tsinghua.edu.cn), 2013, Southeast University (43):
[9] Data mining with parallel support vector machines for classification
Eitrich, Tatjana
Lang, Bruno
ADVANCES IN INFORMATION SYSTEMS, PROCEEDINGS, 2006, 4243 : 197 - 206
[10] Parallel tuning of support vector machine learning parameters for large and unbalanced data sets
Eitrich, T
Lang, B
COMPUTATIONAL LIFE SCIENCES, PROCEEDINGS, 2005, 3695 : 253 - 264

← 1 2 3 4 5 →