Large-Scale Learning with Structural Kernels for Class-Imbalanced Datasets

被引：0

作者：

Severyn, Aliaksei ^{[1
]}

Moschitti, Alessandro ^{[1
]}

机构：

[1] Univ Trento, Dept Comp Sci & Engn, I-38123 Povo, TN, Italy

来源：

ETERNAL SYSTEMS | 2012年 / 255卷

关键词：

Machine Learning; Kernel Methods; Structural Kernels; Support Vector Machine; Natural Language Processing;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Much of the success in machine learning can be attributed to the ability of learning methods to adequately represent, extract, and exploit inherent structure present in the data under interest. Kernel methods represent a rich family of techniques that harvest on this principle. Domain-specific kernels are able to exploit rich structural information present in the input data to deliver state of the art results in many application areas, e.g. natural language processing (NLP), bio-informatics, computer vision and many others. The use of kernels to capture relationships in the input data has made Support Vector Machine (SVM) algorithm the state of the art tool in many application areas. Nevertheless, kernel learning remains a computationally expensive process. The contribution of this paper is to make learning with structural kernels, e.g. tree kernels, more applicable to real-world large-scale tasks. More specifically, we propose two important enhancements of the approximate cutting plane algorithm to train Support Vector Machines with structural kernels: (i) a new sampling strategy to handle class-imbalanced problem; and (ii) a parallel implementation, which makes the training scale almost linearly with the number of CPUs. We also show that theoretical convergence bounds are preserved for the improved algorithm. The experimental evaluations demonstrate the soundness of our approach and the possibility to carry out large-scale learning with structural kernels.

引用

下载

页码：34 / 41

页数：8

共 50 条

[41] Boosting for class-imbalanced datasets using genetically evolved supervised non-linear projections
García-Pedrajas N.
García-Osorio C.
Progress in Artificial Intelligence, 2013, 2 (1) : 29 - 44
[42] Dealing with high-dimensional class-imbalanced datasets: Embedded feature selection for SVM classification
Maldonado, Sebastian
Lopez, Julio
APPLIED SOFT COMPUTING, 2018, 67 : 94 - 105
[43] Class-imbalanced complementary-label learning via weighted loss
Wei, Meng
Zhou, Yong
Li, Zhongnian
Xu, Xinzheng
NEURAL NETWORKS, 2023, 166 : 555 - 565
[44] Minority Class Oriented Active Learning for Imbalanced Datasets
Aggarwal, Umang
Popescu, Adrian
Hudelot, Celine
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9920 - 9927
[45] Datasets, tasks, and training methods for large-scale hypergraph learning
Kim, Sunwoo
Lee, Dongjin
Kim, Yul
Park, Jungho
Hwang, Taeho
Shin, Kijung
DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 37 (06) : 2216 - 2254
[46] Learning Bayesian Network Structure from Large-scale Datasets
Hong, Yu
Xia, Xiaoling
Le, Jiajin
Zhou, Xiangdong
2016 FOURTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD 2016), 2016, : 258 - 264
[47] Learning From Noisy Large-Scale Datasets With Minimal Supervision
Veit, Andreas
Alldrin, Neil
Chechik, Gal
Krasin, Ivan
Gupta, Abhinav
Belongie, Serge
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6575 - 6583
[48] MMSVC: An Efficient Unsupervised Learning Approach for Large-Scale Datasets
Gu, Hong
Zhao, Guangzhou
Zhang, Jianliang
LIFE SYSTEM MODELING AND INTELLIGENT COMPUTING, 2010, 6330 : 1 - 9
[49] MMSVC: An efficient unsupervised learning approach for large-scale datasets
Gu, Hong
Zhao, Guangzhou
Zhang, Jianliang
NEUROCOMPUTING, 2012, 98 : 114 - 122
[50] Datasets, tasks, and training methods for large-scale hypergraph learning
Sunwoo Kim
Dongjin Lee
Yul Kim
Jungho Park
Taeho Hwang
Kijung Shin
Data Mining and Knowledge Discovery, 2023, 37 : 2216 - 2254

← 1 2 3 4 5 →