Learning from Positive and Unlabeled Data with Arbitrary Positive Shift

被引:0
|
作者
Hammoudeh, Zayd [1 ]
Lowd, Daniel [1 ]
机构
[1] Univ Oregon, Dept Comp & Informat Sci, Eugene, OR 97403 USA
关键词
COVARIATE SHIFT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Positive-unlabeled (PU) learning trains a binary classifier using only positive and unlabeled data. A common simplifying assumption is that the positive data is representative of the target positive class. This assumption rarely holds in practice due to temporal drift, domain shift, and/or adversarial manipulation. This paper shows that PU learning is possible even with arbitrarily non-representative positive data given unlabeled data from the source and target distributions. Our key insight is that only the negative class's distribution need be fixed. We integrate this into two statistically consistent methods to address arbitrary positive bias - one approach combines negative-unlabeled learning with unlabeled-unlabeled learning while the other uses a novel, recursive risk estimator. Experimental results demonstrate our methods' effectiveness across numerous real-world datasets and forms of positive bias, including disjoint positive class-conditional supports. Additionally, we propose a general, simplified approach to address PU risk estimation overfitting.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Covariate Shift Adaptation on Learning from Positive and Unlabeled Data
    Sakai, Tomoya
    Shimizu, Nobuyuki
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 4838 - 4845
  • [2] Analysis of Learning from Positive and Unlabeled Data
    du Plessis, Marthinus C.
    Niu, Gang
    Sugiyama, Masashi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [3] Learning from positive and unlabeled data: a survey
    Bekker, Jessa
    Davis, Jesse
    [J]. MACHINE LEARNING, 2020, 109 (04) : 719 - 760
  • [4] Learning from positive and unlabeled data: a survey
    Jessa Bekker
    Jesse Davis
    [J]. Machine Learning, 2020, 109 : 719 - 760
  • [5] Convex Formulation for Learning from Positive and Unlabeled Data
    du Plessis, Marthinus Christoffel
    Niu, Gang
    Sugiyama, Masashi
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 1386 - 1394
  • [6] Predictive Adversarial Learning from Positive and Unlabeled Data
    Hu, Wenpeng
    Le, Ran
    Liu, Bing
    Ji, Feng
    Ma, Jinwen
    Zhao, Dongyan
    Yan, Rui
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7806 - 7814
  • [7] Learning from data streams with only positive and unlabeled data
    Qin, Xiangju
    Zhang, Yang
    Li, Chen
    Li, Xue
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2013, 40 (03) : 405 - 430
  • [8] Positive-Unlabeled Learning from Imbalanced Data
    Su, Guangxin
    Chen, Weitong
    Xu, Miao
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2995 - 3001
  • [9] Learning from data streams with only positive and unlabeled data
    Xiangju Qin
    Yang Zhang
    Chen Li
    Xue Li
    [J]. Journal of Intelligent Information Systems, 2013, 40 : 405 - 430
  • [10] Federated Learning with Positive and Unlabeled Data
    Lin, Xinyang
    Chen, Hanting
    Xu, Yixing
    Xu, Chao
    Gui, Xiaolin
    Deng, Yiping
    Wang, Yunhe
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,