Large Margin Distribution Learning with Cost Interval and Unlabeled Data

被引:34
|
作者
Zhou, Yu-Hang [1 ]
Zhou, Zhi-Hua [1 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China
基金
美国国家科学基金会;
关键词
Margin distribution; cost interval; semi-supervised learning; SUPPORT VECTOR MACHINES; CLASSIFICATION;
D O I
10.1109/TKDE.2016.2535283
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In many real-world applications, different types of misclassification usually suffer from different costs, but the accurate cost is often hard to be determined and usually one can only get an interval-estimation like that one type of mistake is about 5 to 10 times more serious than the other type. On the other hand, there are usually abundant unlabeled data available, leading to great research effort about semi-supervised learning. It is noticeable that cost interval and unlabeled data usually appear simultaneously in practice tasks; however, there is rare study tackling them together. In this paper, we propose the cisLDM approach which is able to handle cost interval and exploit unlabeled data in a principled way. Rather than maximizing the minimum margin like traditional large margin classifiers, cisLDM tries to optimize the margin distribution on both labeled and unlabeled data when minimizing the worst-case total-cost and the mean total-cost simultaneously according to the cost interval. Experiments on a broad range of datasets and cost settings exhibit the impressive performance of cisLDM. In particular, cisLDM is able to reduce 47 percent more total-cost than standard SVM and 27 percent more total-cost than cost-sensitive semi-supervised SVM which assumes the true cost value is known in advance.
引用
收藏
页码:1749 / 1763
页数:15
相关论文
共 50 条
  • [1] Synthesis of maximum margin and multiview learning using unlabeled data
    Szedmak, Sandor
    Shawe-Taylor, John
    [J]. NEUROCOMPUTING, 2007, 70 (7-9) : 1254 - 1264
  • [2] A comparative study on the use of labeled and unlabeled data for large margin classifiers
    Takamura, H
    Okumura, M
    [J]. NATURAL LANGUAGE PROCESSING - IJCNLP 2004, 2005, 3248 : 456 - 465
  • [3] Cost-Sensitive Large margin Distribution Machine for classification of imbalanced data
    Cheng, Fanyong
    Zhang, Jing
    Wen, Cuihong
    [J]. PATTERN RECOGNITION LETTERS, 2016, 80 : 107 - 112
  • [4] Large cost-sensitive margin distribution machine for imbalanced data classification
    Cheng, Fanyong
    Zhang, Jing
    Wen, Cuihong
    Liu, Zhaohua
    Li, Zuoyong
    [J]. NEUROCOMPUTING, 2017, 224 : 45 - 57
  • [5] A Gaussian Latent Variable Model for Large Margin Classification of Labeled and Unlabeled Data
    Kim, Do-kyum
    Der, Matthew
    Saul, Lawrence K.
    [J]. ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 33, 2014, 33 : 484 - 492
  • [6] Large-Margin Label-Calibrated Support Vector Machines for Positive and Unlabeled Learning
    Gong, Chen
    Liu, Tongliang
    Yang, Jian
    Tao, Dacheng
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (11) : 3471 - 3483
  • [7] Transfer Learning using Transformation: Is Large Unlabeled Data Helpful at Segmentation?
    Lim, Heejeong
    Yoon, Seongwook
    Sull, Sanghoon
    [J]. 11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 387 - 390
  • [8] Large Margin Distribution Machine for Imbalanced Data Classification
    Wang, DingXiang
    Zhang, XiaoGang
    Cheng, FanYong
    [J]. 2018 13TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2018, : 893 - 898
  • [9] Distribution-Dependent Sample Complexity of Large Margin Learning
    Sabato, Sivan
    Srebro, Nathan
    Tishby, Naftali
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2013, 14 : 2119 - 2149
  • [10] Distribution-dependent sample complexity of large margin learning
    Sabato, Sivan
    Srebro, Nathan
    Tishby, Naftali
    [J]. Journal of Machine Learning Research, 2013, 14 : 2119 - 2149