Classification and feature selection methods based on fitting logistic regression to PU data

被引:1
|
作者
Furmanczyk, Konrad [1 ]
Paczutkowski, Kacper [1 ]
Dudzinski, Marcin [1 ]
Dziewa-Dawidczyk, Diana [1 ]
机构
[1] Warsaw Univ Life Sci, Inst Informat Technol, Warsaw, Poland
关键词
Positive unlabeled learning; Logistic regression; Empirical risk minimization; Thresholded Lasso; Mutual information-based feature selection;
D O I
10.1016/j.jocs.2023.102095
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In our work, we examine the classification methods where the positive and unlabeled data are considered and where the conditional distribution of the true class label given the feature vector is governed by the model of logistic regression. Our first objective is to compute and compare the selected metrics allowing for the quality assessment of these methods. In this context, we investigate four methods of the posterior probability estimation, where the risk of logistic loss function is optimized: the naive approach, the weighted likelihood approach, as well as the quite recently proposed methods - the joint approach, and the LassoJoint method. The corresponding evaluations are basically performed for 13 machine learning models on some chosen - both low-and high-dimensional - datasets. Some of the mentioned machine learning model schemes have been directly borrowed from literature and some have been obtained through some modifications in the existing procedures. Our second goal is to establish the most stable and efficient approach for the posterior probability estimation. Moreover, we use the AdaSampling scheme for comparison of the considered classification methods. We also conducted comparisons of feature selection procedures - the Mutual Information-Based feature selection method and the LassoJoint approach. The current article is an enhancement of the conference paper Furmanczyk et al. (2022).
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Review on Feature Selection Methods for Gene Expression Data Classification
    Almutiri, Talal
    Saeed, Faisal
    [J]. EMERGING TRENDS IN INTELLIGENT COMPUTING AND INFORMATICS: DATA SCIENCE, INTELLIGENT INFORMATION SYSTEMS AND SMART COMPUTING, 2020, 1073 : 24 - 34
  • [32] Evaluation of Forensic Data Using Logistic Regression-Based Classification Methods and an R Shiny Implementation
    Biosa, Giulia
    Giurghita, Diana
    Alladio, Eugenio
    Vincenti, Marco
    Neocleous, Tereza
    [J]. FRONTIERS IN CHEMISTRY, 2020, 8
  • [33] Logistic Regression Model Based on Ultrafast Pulse Wave Velocity and Different Feature Selection Methods to Predict the Risk of Hypertension
    Bai, Xue
    Liu, Wenjun
    Huang, Hui
    You, Huan
    [J]. IRANIAN JOURNAL OF PUBLIC HEALTH, 2022, 51 (09) : 2099 - 2107
  • [34] Microarray Data Classification Using Feature Selection and Regularized Methods with Sampling Methods
    Jyothi, Saddi
    Reddy, Y. Sowmya
    Lavanya, K.
    [J]. UBIQUITOUS INTELLIGENT SYSTEMS, 2022, 302 : 351 - 358
  • [35] Gait Signal Classification Tool Utilizing Hilbert Transform Based Feature Extraction And Logistic Regression Based Classification
    Vipani, Raj
    Hore, Sambit
    Basak, Souryadeep
    Dutta, Saibal
    [J]. 2017 THIRD IEEE INTERNATIONAL CONFERENCE ON RESEARCH IN COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (ICRCICN), 2017, : 57 - 61
  • [36] Revisiting Strategies for Fitting Logistic Regression for Positive and Unlabeled Data
    WAWRZENCZYK, A. D. A. M.
    MIELNICZUK, J. A. N.
    [J]. INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2022, 32 (02) : 299 - 309
  • [37] Application of Logistic Regression with Filter in Data Classification
    Yang, Zan
    Li, Dan
    [J]. PROCEEDINGS OF THE 38TH CHINESE CONTROL CONFERENCE (CCC), 2019, : 3755 - 3759
  • [38] Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data
    Teisseyre, Pawel
    Mielniczuk, Jan
    Lazecka, Malgorzata
    [J]. COMPUTATIONAL SCIENCE - ICCS 2020, PT IV, 2020, 12140 : 3 - 17
  • [39] Logistic regression for evolving data streams classification
    Dept. of Computer Science and Eng., Shanghai Jiaotong Univ., Shanghai 200030, China
    [J]. J. Shanghai Jiaotong Univ. Sci., 2007, 2 (197-203):
  • [40] Imbalanced Data Classification Based on Feature Selection Techniques
    Ksieniewicz, Pawel
    Wozniak, Michal
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING (IDEAL 2018), PT II, 2018, 11315 : 296 - 303