Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data

被引:4
|
作者
Teisseyre, Pawel [1 ]
Mielniczuk, Jan [1 ,2 ]
Lazecka, Malgorzata [1 ,2 ]
机构
[1] Polish Acad Sci, Inst Comp Sci, Warsaw, Poland
[2] Warsaw Univ Technol, Fac Math & Informat Sci, Warsaw, Poland
来源
关键词
Positive unlabelled learning; Logistic regression; Empirical risk minimization; Misspecification;
D O I
10.1007/978-3-030-50423-6_1
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In the paper we revisit the problem of fitting logistic regression to positive and unlabelled data. There are two key contributions. First, a new light is shed on the properties of frequently used naive method (in which unlabelled examples are treated as negative). In particular we show that naive method is related to incorrect specification of the logistic model and consequently the parameters in naive method are shrunk towards zero. An interesting relationship between shrinkage parameter and label frequency is established. Second, we introduce a novel method of fitting logistic model based on simultaneous estimation of vector of coefficients and label frequency. Importantly, the proposed method does not require prior estimation, which is a major obstacle in positive unlabelled learning. The method is superior in predicting posterior probability to both naive method and weighted likelihood method for several benchmark data sets. Moreover, it yields consistently better estimator of label frequency than other two known methods. We also introduce simple but powerful representation of positive and unlabelled data under Selected Completely at Random assumption which yields straightforwardly most properties of such model.
引用
收藏
页码:3 / 17
页数:15
相关论文
共 50 条
  • [1] Revisiting Strategies for Fitting Logistic Regression for Positive and Unlabeled Data
    WAWRZENCZYK, A. D. A. M.
    MIELNICZUK, J. A. N.
    [J]. INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2022, 32 (02) : 299 - 309
  • [2] Estimating the class prior for positive and unlabelled data via logistic regression
    Lazecka, Malgorzata
    Mielniczuk, Jan
    Teisseyre, Pawel
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2021, 15 (04) : 1039 - 1068
  • [3] Estimating the class prior for positive and unlabelled data via logistic regression
    Małgorzata Łazęcka
    Jan Mielniczuk
    Paweł Teisseyre
    [J]. Advances in Data Analysis and Classification, 2021, 15 : 1039 - 1068
  • [4] Classification Methods Based on Fitting Logistic Regression to Positive and Unlabeled Data
    Furmanczyk, Konrad
    Paczutkowski, Kacper
    Dudzinski, Marcin
    Dziewa-Dawidczyk, Diana
    [J]. COMPUTATIONAL SCIENCE - ICCS 2022, PT I, 2022, : 31 - 45
  • [5] Fitting logistic regression models with contaminated case-control data
    Cheng, K. F.
    Chen, L. C.
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2006, 136 (12) : 4147 - 4160
  • [6] Classification and feature selection methods based on fitting logistic regression to PU data
    Furmanczyk, Konrad
    Paczutkowski, Kacper
    Dudzinski, Marcin
    Dziewa-Dawidczyk, Diana
    [J]. JOURNAL OF COMPUTATIONAL SCIENCE, 2023, 72
  • [7] A study on the effects of unbalanced data when fitting logistic regression models in ecology
    Salas-Eljatib, Christian
    Fuentes-Ramirez, Andres
    Gregoire, Timothy G.
    Altamirano, Adison
    Yaitul, Valeska
    [J]. ECOLOGICAL INDICATORS, 2018, 85 : 502 - 508
  • [8] Testing logistic regression coefficients with clustered data and few positive outcomes
    Hunsberger, Sally
    Graubard, Barry I.
    Korn, Edward L.
    [J]. STATISTICS IN MEDICINE, 2008, 27 (08) : 1305 - 1324
  • [9] THE LOGISTIC CURVE FOR THE FITTING OF SIGMOIDAL DATA
    CERNY, LC
    STASIW, DM
    ZUK, W
    [J]. PHYSIOLOGICAL CHEMISTRY AND PHYSICS, 1981, 13 (03): : 221 - 230
  • [10] Strategies to Face Imbalanced and Unlabelled Data in PHM Applications
    Gouriveau, Rafael
    Ramasso, Emmanuel
    Zerhouni, Noureddine
    [J]. 2013 PROGNOSTICS AND HEALTH MANAGEMENT CONFERENCE (PHM), 2013, 33 : 115 - 120