Large-Scale Learning with Structural Kernels for Class-Imbalanced Datasets

被引:0
|
作者
Severyn, Aliaksei [1 ]
Moschitti, Alessandro [1 ]
机构
[1] Univ Trento, Dept Comp Sci & Engn, I-38123 Povo, TN, Italy
来源
ETERNAL SYSTEMS | 2012年 / 255卷
关键词
Machine Learning; Kernel Methods; Structural Kernels; Support Vector Machine; Natural Language Processing;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Much of the success in machine learning can be attributed to the ability of learning methods to adequately represent, extract, and exploit inherent structure present in the data under interest. Kernel methods represent a rich family of techniques that harvest on this principle. Domain-specific kernels are able to exploit rich structural information present in the input data to deliver state of the art results in many application areas, e.g. natural language processing (NLP), bio-informatics, computer vision and many others. The use of kernels to capture relationships in the input data has made Support Vector Machine (SVM) algorithm the state of the art tool in many application areas. Nevertheless, kernel learning remains a computationally expensive process. The contribution of this paper is to make learning with structural kernels, e.g. tree kernels, more applicable to real-world large-scale tasks. More specifically, we propose two important enhancements of the approximate cutting plane algorithm to train Support Vector Machines with structural kernels: (i) a new sampling strategy to handle class-imbalanced problem; and (ii) a parallel implementation, which makes the training scale almost linearly with the number of CPUs. We also show that theoretical convergence bounds are preserved for the improved algorithm. The experimental evaluations demonstrate the soundness of our approach and the possibility to carry out large-scale learning with structural kernels.
引用
下载
收藏
页码:34 / 41
页数:8
相关论文
共 50 条
  • [41] Boosting for class-imbalanced datasets using genetically evolved supervised non-linear projections
    García-Pedrajas N.
    García-Osorio C.
    Progress in Artificial Intelligence, 2013, 2 (1) : 29 - 44
  • [42] Dealing with high-dimensional class-imbalanced datasets: Embedded feature selection for SVM classification
    Maldonado, Sebastian
    Lopez, Julio
    APPLIED SOFT COMPUTING, 2018, 67 : 94 - 105
  • [43] Class-imbalanced complementary-label learning via weighted loss
    Wei, Meng
    Zhou, Yong
    Li, Zhongnian
    Xu, Xinzheng
    NEURAL NETWORKS, 2023, 166 : 555 - 565
  • [44] Minority Class Oriented Active Learning for Imbalanced Datasets
    Aggarwal, Umang
    Popescu, Adrian
    Hudelot, Celine
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9920 - 9927
  • [45] Datasets, tasks, and training methods for large-scale hypergraph learning
    Kim, Sunwoo
    Lee, Dongjin
    Kim, Yul
    Park, Jungho
    Hwang, Taeho
    Shin, Kijung
    DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 37 (06) : 2216 - 2254
  • [46] Learning Bayesian Network Structure from Large-scale Datasets
    Hong, Yu
    Xia, Xiaoling
    Le, Jiajin
    Zhou, Xiangdong
    2016 FOURTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD 2016), 2016, : 258 - 264
  • [47] Learning From Noisy Large-Scale Datasets With Minimal Supervision
    Veit, Andreas
    Alldrin, Neil
    Chechik, Gal
    Krasin, Ivan
    Gupta, Abhinav
    Belongie, Serge
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6575 - 6583
  • [48] MMSVC: An Efficient Unsupervised Learning Approach for Large-Scale Datasets
    Gu, Hong
    Zhao, Guangzhou
    Zhang, Jianliang
    LIFE SYSTEM MODELING AND INTELLIGENT COMPUTING, 2010, 6330 : 1 - 9
  • [49] MMSVC: An efficient unsupervised learning approach for large-scale datasets
    Gu, Hong
    Zhao, Guangzhou
    Zhang, Jianliang
    NEUROCOMPUTING, 2012, 98 : 114 - 122
  • [50] Datasets, tasks, and training methods for large-scale hypergraph learning
    Sunwoo Kim
    Dongjin Lee
    Yul Kim
    Jungho Park
    Taeho Hwang
    Kijung Shin
    Data Mining and Knowledge Discovery, 2023, 37 : 2216 - 2254