Estimating the class prior for positive and unlabelled data via logistic regression

被引:0
|
作者
Małgorzata Łazęcka
Jan Mielniczuk
Paweł Teisseyre
机构
[1] Polish Academy of Sciences,Institute of Computer Science
[2] Warsaw University of Technology,Faculty of Mathematics and Information Sciences
关键词
Positive unlabelled learning; Class prior estimation; Logistic regression; Non-convex optimisation; Minorization-maximization algorithm; 62H30; 62J12;
D O I
暂无
中图分类号
学科分类号
摘要
In the paper, we revisit the problem of class prior probability estimation with positive and unlabelled data gathered in a single-sample scenario. The task is important as it is known that in positive unlabelled setting, a classifier can be successfully learned if the class prior is available. We show that without additional assumptions, class prior probability is not identifiable and thus the existing non-parametric estimators are necessarily biased in general if extra assumptions are not imposed. The magnitude of their bias is also investigated. The problem becomes identifiable when the probabilistic structure satisfies mild semi-parametric assumptions. Consequently, we propose a method based on a logistic fit and a concave minorization of its (non-concave) log-likelihood. The experiments conducted on artificial and benchmark datasets as well as on a large clinical database MIMIC indicate that the estimation errors for the proposed method are usually lower than for its competitors and that it is robust against departures from logistic settings.
引用
收藏
页码:1039 / 1068
页数:29
相关论文
共 50 条
  • [31] A Procedure for Estimating the Number of Clusters in Logistic Regression Clustering
    Qian, Guoqi
    Wu, Yuehua
    Shao, Qing
    [J]. JOURNAL OF CLASSIFICATION, 2009, 26 (02) : 183 - 199
  • [32] A Procedure for Estimating the Number of Clusters in Logistic Regression Clustering
    Guoqi Qian
    Yuehua Wu
    Qing Shao
    [J]. Journal of Classification, 2009, 26 : 183 - 199
  • [33] Estimating adjusted NNT measures in logistic regression analysis
    Bender, Ralf
    Kuss, Oliver
    Hildebrandt, Mandy
    Gehrmann, Ulrich
    [J]. STATISTICS IN MEDICINE, 2007, 26 (30) : 5586 - 5595
  • [34] Estimating Differential Item Functioning Through Logistic Regression
    Ghaemi, Hamed
    [J]. MODERN JOURNAL OF LANGUAGE TEACHING METHODS, 2011, 1 (03): : 89 - 102
  • [35] Estimating player contribution in hockey with regularized logistic regression
    Gramacy, Robert B.
    Jensen, Shane T.
    Taddy, Matt
    [J]. JOURNAL OF QUANTITATIVE ANALYSIS IN SPORTS, 2013, 9 (01) : 97 - 111
  • [36] Variable and boundary selection for functional data via multiclass logistic regression modeling
    Matsui, Hidetoshi
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 78 : 176 - 185
  • [37] Data Association via Logistic Regression Model for Multiple Target Tracking Problems
    Chen, Chen
    Zhou, Jie
    [J]. 2017 20TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2017, : 816 - 821
  • [38] Estimating Speaker Clustering Quality Using Logistic Regression
    Cohen, Yishai
    Lapidot, Itshak
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3577 - 3581
  • [39] Transductive Bayesian regression via manifold learning of prior data structure
    Park, Hyejin
    Kim, Heun A.
    Yang, Seung-ho
    Lee, Jaewook
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (16) : 12557 - 12563
  • [40] Mixed-effects logistic regression for estimating transitional probabilities in sequentially coded observational data
    Ozechowski, Timothy J.
    Turner, Charles W.
    Hops, Hyman
    [J]. PSYCHOLOGICAL METHODS, 2007, 12 (03) : 317 - 335