Estimating the class prior for positive and unlabelled data via logistic regression

被引:0
|
作者
Małgorzata Łazęcka
Jan Mielniczuk
Paweł Teisseyre
机构
[1] Polish Academy of Sciences,Institute of Computer Science
[2] Warsaw University of Technology,Faculty of Mathematics and Information Sciences
关键词
Positive unlabelled learning; Class prior estimation; Logistic regression; Non-convex optimisation; Minorization-maximization algorithm; 62H30; 62J12;
D O I
暂无
中图分类号
学科分类号
摘要
In the paper, we revisit the problem of class prior probability estimation with positive and unlabelled data gathered in a single-sample scenario. The task is important as it is known that in positive unlabelled setting, a classifier can be successfully learned if the class prior is available. We show that without additional assumptions, class prior probability is not identifiable and thus the existing non-parametric estimators are necessarily biased in general if extra assumptions are not imposed. The magnitude of their bias is also investigated. The problem becomes identifiable when the probabilistic structure satisfies mild semi-parametric assumptions. Consequently, we propose a method based on a logistic fit and a concave minorization of its (non-concave) log-likelihood. The experiments conducted on artificial and benchmark datasets as well as on a large clinical database MIMIC indicate that the estimation errors for the proposed method are usually lower than for its competitors and that it is robust against departures from logistic settings.
引用
收藏
页码:1039 / 1068
页数:29
相关论文
共 50 条
  • [1] Estimating the class prior for positive and unlabelled data via logistic regression
    Lazecka, Malgorzata
    Mielniczuk, Jan
    Teisseyre, Pawel
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2021, 15 (04) : 1039 - 1068
  • [2] Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data
    Teisseyre, Pawel
    Mielniczuk, Jan
    Lazecka, Malgorzata
    [J]. COMPUTATIONAL SCIENCE - ICCS 2020, PT IV, 2020, 12140 : 3 - 17
  • [3] Estimating the Class Prior in Positive and Unlabeled Data through Decision Tree Induction
    Bekker, Jessa
    Davis, Jesse
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 2712 - 2719
  • [4] ESTIMATING LOGISTIC-REGRESSION PARAMETERS FOR BIVARIATE BINARY DATA
    MCDONALD, BW
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1993, 55 (02): : 391 - 397
  • [5] ESTIMATING FALSE POSITIVE AND FALSE NEGATIVE ERROR RATES USING LOGISTIC REGRESSION
    Padgett, Lakshmi
    [J]. JP JOURNAL OF BIOSTATISTICS, 2012, 8 (1-2) : 37 - 41
  • [6] Revisiting Strategies for Fitting Logistic Regression for Positive and Unlabeled Data
    WAWRZENCZYK, A. D. A. M.
    MIELNICZUK, J. A. N.
    [J]. INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2022, 32 (02) : 299 - 309
  • [7] A Multi-Class Logistic Regression Model for Interval Data
    de Souza, Renata M. C. R.
    Cysneiros, Francisco Jose A.
    Queiroz, Diego C. F.
    Fagundes, Roberta A. de A.
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 1252 - +
  • [8] Estimating Bias and Variances in Bootstrap Logistic Regression for Umaru and Impact Data
    Fitrianto, Anwar
    Cing, Ng Mei
    [J]. INTERNATIONAL CONFERENCE ON QUANTITATIVE SCIENCES AND ITS APPLICATIONS (ICOQSIA 2014), 2014, 1635 : 742 - 747
  • [9] JEFFREYS PRIOR REGULARIZATION FOR LOGISTIC REGRESSION
    Tam Nguyen
    Raich, Raviv
    Phung Lai
    [J]. 2016 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2016,
  • [10] Inferring Network Structure and Estimating Dynamical Process From Binary-State Data via Logistic Regression
    Liu, Qi-Ming
    Ma, Chuang
    Xiang, Bing-Bing
    Chen, Han-Shuang
    Zhang, Hai-Feng
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2021, 51 (08): : 4639 - 4649