Decision trees using local support vector regression models for large datasets

被引:0
|
作者
Tran-Nguyen, Minh-Thu [1 ]
Bui, Le-Diem [1 ,3 ]
Do, Thanh-Nghi [1 ,2 ]
机构
[1] Can Tho Univ, Coll Informat Technol, Can Tho, Vietnam
[2] Pierre & Marie Curie Univ, Sorbonne Univ, UMI UMMISCO 209 IRD UPMC, Paris 6, Paris, France
[3] Gyeongsang Natl Univ, Comp Sci Dept, AI Lab, Jinju, South Korea
关键词
Support vector regression (SVR); decision tree; local support vector regression (local SVR); ensemble learning; large datasets; CLASSIFICATION; HYPERPLANE;
D O I
10.1080/24751839.2019.1686682
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Our proposed decision trees using local support vector regression models (tSVR, rtSVR) aim to efficiently handle the regression task for large datasets. The learning algorithm tSVR of regression models is done by two main steps. The first one is to construct a decision tree regressor for partitioning the full training dataset into k terminal-nodes (subsets), followed which the second one is to learn the SVR model from each terminal-node to predict the data locally in a parallel way on multi-core computers. The algorithm rtSVR learns the random forest of decision trees with local SVR models for improving the prediction correctness against the tSVR model alone. The performance analysis shows that our algorithms tSVR, rtSVR are efficient in terms of the algorithmic complexity and the generalization ability compared to the classical SVR. The experimental results on five large datasets from UCI repository showed that proposed tSVR and rtSVR algorithms are faster than the standard SVR in training the non-linear regression model from large datasets while achieving the high correctness in the prediction. Typically, the average training time of tSVR and rtSVR are 1282.66 and 482.29 times faster than the standard SVR; Furthermore, tSVR and rtSVR improve 59.43%, 63.70% of the relative prediction correctness compared to the standard SVR.
引用
收藏
页码:17 / 35
页数:19
相关论文
共 50 条
  • [1] Decision Tree Using Local Support Vector Regression for Large Datasets
    Minh-Thu Tran-Nguyen
    Bui, Le-Diem
    Kim, Yong-Gi
    Thanh-Nghi Do
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2018, PT I, 2018, 10751 : 255 - 265
  • [2] Parallel Algorithm of Local Support Vector Regression for Large Datasets
    Le-Diem Bui
    Minh-Thu Tran-Nguyen
    Kim, Yong-Gi
    Thanh-Nghi Do
    [J]. FUTURE DATA AND SECURITY ENGINEERING, 2017, 10646 : 139 - 153
  • [3] Selecting rows and columns for training support vector regression models with large retail datasets
    Ali, Ozden Gur
    Yaman, Kubra
    [J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2013, 226 (03) : 471 - 480
  • [4] Fast Local Support Vector Machines for Large Datasets
    Segata, Nicola
    Blanzieri, Enrico
    [J]. MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, 2009, 5632 : 295 - +
  • [5] Support vector machine classification for large datasets using decision tree and Fisher linear discriminant
    Lopez Chau, Asdrubal
    Li, Xiaoou
    Yu, Wen
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2014, 36 : 57 - 65
  • [6] Efficient and Private Scoring of Decision Trees, Support Vector Machines and Logistic Regression Models Based on Pre-Computation
    De Cock, Martine
    Dowsley, Rafael
    Horst, Caleb
    Katti, Raj
    Nascimento, Anderson C. A.
    Poon, Wing-Sea
    Truex, Stacey
    [J]. IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2019, 16 (02) : 217 - 230
  • [7] An investigation of the factors influencing cost system functionality using decision trees, support vector machines and logistic regression
    Kuzey, Cemil
    Uyar, Ali
    Delen, Dursun
    [J]. INTERNATIONAL JOURNAL OF ACCOUNTING AND INFORMATION MANAGEMENT, 2019, 27 (01) : 27 - 55
  • [8] Automated Analysis of Regularities Between Model Parameters and Output Using Support Vector Regression in Conjunction with Decision Trees
    Edali, Mert
    Yucel, Gonenc
    [J]. JASSS-THE JOURNAL OF ARTIFICIAL SOCIETIES AND SOCIAL SIMULATION, 2018, 21 (04):
  • [9] A Parallel Algorithm to Induce Decision Trees for Large Datasets
    Franco-Arcega, A.
    Suarez-Cansino, J.
    Flores-Flores, L. G.
    [J]. 2013 XXIV INTERNATIONAL SYMPOSIUM ON INFORMATION, COMMUNICATION AND AUTOMATION TECHNOLOGIES (ICAT), 2013,
  • [10] Multivariate Decision Trees Using Different Splitting Attribute Subsets for Large Datasets
    Franco-Arcega, Anilu
    Ariel Carrasco-Ochoa, Jose
    Sanchez-Diaz, Guillermo
    Fco Martinez-Trinidad, Jose
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2010, 6085 : 370 - +