Support Vector Machines on Large Data Sets: Simple Parallel Approaches

被引：13

作者：

Meyer, Oliver ^{[1
]}

Bischl, Bernd ^{[1
]}

Weihs, Claus ^{[1
]}

机构：

[1] TU Dortmund, Dept Stat, Chair Computat Stat, Dortmund, Germany

来源：

DATA ANALYSIS, MACHINE LEARNING AND KNOWLEDGE DISCOVERY | 2014年

关键词：

D O I：

10.1007/978-3-319-01595-8_10

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Support Vector Machines (SVMs) are well-known for their excellent performance in the field of statistical classification. Still, the high computational cost due to the cubic runtime complexity is problematic for larger data sets. To mitigate this, Graf et al. (Adv. Neural Inf. Process. Syst. 17:521-528, 2005) proposed the Cascade SVM. It is a simple, stepwise procedure, in which the SVM is iteratively trained on subsets of the original data set and support vectors of resulting models are combined to create new training sets. The general idea is to bound the size of all considered training sets and therefore obtain a significant speedup. Another relevant advantage is that this approach can easily be parallelized because a number of independent models have to be fitted during each stage of the cascade. Initial experiments show that even moderate parallelization can reduce the computation time considerably, with only minor loss in accuracy. We compare the Cascade SVM to the standard SVM and a simple parallel bagging method w.r.t. both classification accuracy and training time. We also introduce a new stepwise bagging approach that exploits parallelization in a better way than the Cascade SVM and contains an adaptive stopping-time to select the number of stages for improved accuracy.

引用

页码：87 / 95

页数：9

共 50 条

[21] Classifying data sets using posterior probability for multiclass support vector machines
Wang, Hongmei
Zeng, Yuan
Zhao, Zheng
Wang, Chengshan
Journal of Computational Information Systems, 2008, 4 (02): : 541 - 546
[22] Parallel Computing of Support Vector Machines: A Survey
Tavara, Shirin
ACM COMPUTING SURVEYS, 2019, 51 (06)
[23] Core vector machines: Fast SVM training on very large data sets
Tsang, IW
Kwok, JT
Cheung, PM
JOURNAL OF MACHINE LEARNING RESEARCH, 2005, 6 : 363 - 392
[24] Using support vector machines for classifying large sets of multi-represented objects
Kriegel, HP
Kröger, P
Pryakhin, A
Schubert, M
PROCEEDINGS OF THE FOURTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2004, : 102 - 113
[25] Data condensation in large databases by incremental learning with support vector machines
Mitra, P
Murthy, CA
Pal, SK
15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS: PATTERN RECOGNITION AND NEURAL NETWORKS, 2000, : 708 - 711
[26] Feature analysis: Support vector machines approaches
Shi, Y
Zhang, TX
IMAGE EXTRACTION, SEGMENTATION, AND RECOGNITION, 2001, 4550 : 245 - 251
[27] Goal programming approaches to support vector machines
Nakayama, H
Yun, Y
Asada, T
Yoon, M
KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2003, 2773 : 356 - 363
[28] A simple decomposition method for support vector machines
Hsu, CW
Lin, CJ
MACHINE LEARNING, 2002, 46 (1-3) : 291 - 314
[29] A Simple Decomposition Method for Support Vector Machines
Chih-Wei Hsu
Chih-Jen Lin
Machine Learning, 2002, 46 : 291 - 314
[30] Selecting Relevant Descriptors for Classification by Bayesian Estimates: A Comparison with Decision Trees and Support Vector Machines Approaches for Disparate Data Sets
Carbon-Mangels, Miriam
Hutter, Michael C.
MOLECULAR INFORMATICS, 2011, 30 (10) : 885 - 895

← 1 2 3 4 5 →