Support Vector Machines on Large Data Sets: Simple Parallel Approaches

被引：13

作者：

Meyer, Oliver ^{[1
]}

Bischl, Bernd ^{[1
]}

Weihs, Claus ^{[1
]}

机构：

[1] TU Dortmund, Dept Stat, Chair Computat Stat, Dortmund, Germany

来源：

DATA ANALYSIS, MACHINE LEARNING AND KNOWLEDGE DISCOVERY | 2014年

关键词：

D O I：

10.1007/978-3-319-01595-8_10

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Support Vector Machines (SVMs) are well-known for their excellent performance in the field of statistical classification. Still, the high computational cost due to the cubic runtime complexity is problematic for larger data sets. To mitigate this, Graf et al. (Adv. Neural Inf. Process. Syst. 17:521-528, 2005) proposed the Cascade SVM. It is a simple, stepwise procedure, in which the SVM is iteratively trained on subsets of the original data set and support vectors of resulting models are combined to create new training sets. The general idea is to bound the size of all considered training sets and therefore obtain a significant speedup. Another relevant advantage is that this approach can easily be parallelized because a number of independent models have to be fitted during each stage of the cascade. Initial experiments show that even moderate parallelization can reduce the computation time considerably, with only minor loss in accuracy. We compare the Cascade SVM to the standard SVM and a simple parallel bagging method w.r.t. both classification accuracy and training time. We also introduce a new stepwise bagging approach that exploits parallelization in a better way than the Cascade SVM and contains an adaptive stopping-time to select the number of stages for improved accuracy.

引用

页码：87 / 95

页数：9

共 50 条

[41] A distributed ensemble of relevance vector machines for large-scale data sets on Spark
Qin, Wangchen
Liu, Fang
Tong, Mi
Li, Zhengying
SOFT COMPUTING, 2021, 25 (10) : 7119 - 7130
[42] Comments on the Core vector machines: Fast SVM training on very large data sets
LITIS, INSA de Rouen, Avenue de l'Université, 76801 Saint-Etienne du Rouvray, France
J. Mach. Learn. Res., 2007, (291-301):
[43] A distributed ensemble of relevance vector machines for large-scale data sets on Spark
Wangchen Qin
Fang Liu
Mi Tong
Zhengying Li
Soft Computing, 2021, 25 : 7119 - 7130
[44] Fast Training on Large Genomics Data using Distributed Support Vector Machines
Theera-Ampornpunt, Nawanol
Kim, Seong Gon
Ghoshal, Asish
Bagchi, Saurabh
Grama, Ananth
Chaterji, Somali
2016 8TH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORKS (COMSNETS), 2016,
[45] Parallel Multicategory Support Vector Machines (PMC-SVM) for classifying microcarray data
Zhang, Chaoyang
Li, Peng
Rajendran, Arun
Deng, Youping
FIRST INTERNATIONAL MULTI-SYMPOSIUMS ON COMPUTER AND COMPUTATIONAL SCIENCES (IMSCCS 2006), PROCEEDINGS, VOL 1, 2006, : 110 - +
[46] A hierarchical and parallel method for training support vector machines
Wen, YM
Lu, BL
ADVANCES IN NEURAL NETWORKS - ISNN 2005, PT 1, PROCEEDINGS, 2005, 3496 : 881 - 886
[47] Linear programming approaches for multicategory support vector machines
Yajima, Y
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2005, 162 (02) : 514 - 531
[48] Heuristic approaches for support vector machines with the ramp loss
Carrizosa, Emilio
Nogales-Gomez, Amaya
Morales, Dolores Romero
OPTIMIZATION LETTERS, 2014, 8 (03) : 1125 - 1135
[49] Heuristic approaches for support vector machines with the ramp loss
Emilio Carrizosa
Amaya Nogales-Gómez
Dolores Romero Morales
Optimization Letters, 2014, 8 : 1125 - 1135
[50] Ensemble approaches of support vector machines for multiclass classification
Min, Jun-Ki
Hong, Jin-Hyuk
Cho, Sung-Bae
PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2007, 4815 : 1 - 10

← 1 2 3 4 5 →