Classification and feature selection methods based on fitting logistic regression to PU data

被引:1
|
作者
Furmanczyk, Konrad [1 ]
Paczutkowski, Kacper [1 ]
Dudzinski, Marcin [1 ]
Dziewa-Dawidczyk, Diana [1 ]
机构
[1] Warsaw Univ Life Sci, Inst Informat Technol, Warsaw, Poland
关键词
Positive unlabeled learning; Logistic regression; Empirical risk minimization; Thresholded Lasso; Mutual information-based feature selection;
D O I
10.1016/j.jocs.2023.102095
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In our work, we examine the classification methods where the positive and unlabeled data are considered and where the conditional distribution of the true class label given the feature vector is governed by the model of logistic regression. Our first objective is to compute and compare the selected metrics allowing for the quality assessment of these methods. In this context, we investigate four methods of the posterior probability estimation, where the risk of logistic loss function is optimized: the naive approach, the weighted likelihood approach, as well as the quite recently proposed methods - the joint approach, and the LassoJoint method. The corresponding evaluations are basically performed for 13 machine learning models on some chosen - both low-and high-dimensional - datasets. Some of the mentioned machine learning model schemes have been directly borrowed from literature and some have been obtained through some modifications in the existing procedures. Our second goal is to establish the most stable and efficient approach for the posterior probability estimation. Moreover, we use the AdaSampling scheme for comparison of the considered classification methods. We also conducted comparisons of feature selection procedures - the Mutual Information-Based feature selection method and the LassoJoint approach. The current article is an enhancement of the conference paper Furmanczyk et al. (2022).
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Classification Methods Based on Fitting Logistic Regression to Positive and Unlabeled Data
    Furmanczyk, Konrad
    Paczutkowski, Kacper
    Dudzinski, Marcin
    Dziewa-Dawidczyk, Diana
    [J]. COMPUTATIONAL SCIENCE - ICCS 2022, PT I, 2022, : 31 - 45
  • [2] Logistic regression for feature selection and soft classification of remote sensing data
    Cheng, Qi
    Varshney, Pramod K.
    Arora, Manoj K.
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2006, 3 (04) : 491 - 494
  • [3] Feature Selection Based on Logistic Regression for 2-Class Classification of Multidimensional Molecular Data
    Student, Sebastian
    Luciennik, Alicja P.
    Jakubczak, Michal
    Fujarewicz, Krzysztof
    [J]. ARTIFICIAL INTELLIGENCE: METHODOLOGY, SYSTEMS, AND APPLICATIONS, AIMSA 2018, 2018, 11089 : 286 - 290
  • [4] Multinomial logistic regression-based feature selection for hyperspectral data
    Pal, Mahesh
    [J]. INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2012, 14 (01): : 214 - 220
  • [5] Classification of Real Imbalanced Cardiovascular Data Using Feature Selection and Sampling Methods: A Case Study with Neural Networks and Logistic Regression
    Bektas, Jale
    Ibrikci, Turgay
    Ozcan, Ismail Turkay
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2017, 26 (06)
  • [6] Linear regression-based feature selection for microarray data classification
    Hasan, Md Abid
    Hasan, Md Kamrul
    Mottalib, M. Abdul
    [J]. INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2015, 11 (02) : 167 - 179
  • [7] Ensemble Logistic Regression for Feature Selection
    Zakharov, Roman
    Dupont, Pierre
    [J]. PATTERN RECOGNITION IN BIOINFORMATICS, 2011, 7036 : 133 - 144
  • [8] Genetic algorithm with logistic regression feature selection for Alzheimer's disease classification
    Divya, R.
    Kumari, R. Shantha Selva
    [J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (14): : 8435 - 8444
  • [9] Genetic algorithm with logistic regression feature selection for Alzheimer’s disease classification
    R. Divya
    R. Shantha Selva Kumari
    [J]. Neural Computing and Applications, 2021, 33 : 8435 - 8444
  • [10] A Predictive Alignment-free Method based on Logistic Regression for Feature Selection and Classification of Protein Sequences
    Goncalves Marinho Couto, Braulio Roberto
    Santoro, Marcelo Matos
    Ladeira, Ana Paula
    dos Santos, Marcos A.
    [J]. BIOINFORMATICS 2013: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BIOINFORMATICS MODELS, METHODS AND ALGORITHMS, 2013, : 171 - 177