Regularized Feature Selection in Categorical PLS for Multicollinear Data

被引:1
|
作者
Mehmood, Tahir [1 ]
机构
[1] NUST, SNS, Islamabad, Pakistan
关键词
PARTIAL LEAST-SQUARES; BALOCHISTAN;
D O I
10.1155/2021/5561752
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Article presents the algorithm which models the categorical multicollinear data by providing the balance in model accuracy on test data and number of selected features in the model. In all scientific fields, multicollinear data is being generated, where obviously some variables are noise and some are influential reference to response variable. Features and response appeared to be categorical in mathematical and statistical modeling of public health data. These datasets usually appeared to collinear, where partial least squares (PLS) is the potential method, which is not feature selection at its default level and deals with quantitative features. Recently, categorical PLS (Cat-PLS) is introduced. We have implemented the regularized feature selection in Cat-PLS where filter-based feature selection and categorical mean through Cramer's V, Phi coefficient, Tschuprow's T coefficient, Contingency Coefficient, and Yule's Q and Yule's Y are used. Monte carlo simulation with 100 runs indicates Cramer V*VIP is the better choice in terms of better model performance, number of feature selection, and interpretations for modeling the stillbirths, which is taken as the case study. The framework can be used in related areas to explore and model the related data structures.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Unsupervised Feature Selection for Outlier Detection in Categorical Data using Mutual Information
    Suri, N. N. R. Ranga
    Murty, M. Narasimha
    Athithan, G.
    2012 12TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS), 2012, : 253 - 258
  • [22] Privacy preserving data publishing of categorical data through k-anonymity and feature selection
    Aristodimou, Aristos
    Antoniades, Athos
    Pattichis, Constantinos S.
    HEALTHCARE TECHNOLOGY LETTERS, 2016, 3 (01) : 16 - 21
  • [23] Microarray Data Classification Using Feature Selection and Regularized Methods with Sampling Methods
    Jyothi, Saddi
    Reddy, Y. Sowmya
    Lavanya, K.
    UBIQUITOUS INTELLIGENT SYSTEMS, 2022, 302 : 351 - 358
  • [24] Discussion of "regularized regression for categorical data' by Tutz and Gertheiss
    Leng, Chenlei
    STATISTICAL MODELLING, 2016, 16 (03) : 217 - 219
  • [25] Discussion on "regularized regression for categorical data (Tutz and Gertheiss)'
    Buhlmann, Peter
    Dezeure, Ruben
    STATISTICAL MODELLING, 2016, 16 (03) : 205 - 211
  • [26] Co-regularized unsupervised feature selection
    Zhu, Pengfei
    Xu, Qian
    Hu, Qinghua
    Zhang, Changqing
    NEUROCOMPUTING, 2018, 275 : 2855 - 2863
  • [27] Robust graph regularized unsupervised feature selection
    Tang, Chang
    Zhu, Xinzhong
    Chen, Jiajia
    Wang, Pichao
    Liu, Xinwang
    Tian, Jie
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 96 : 64 - 76
  • [28] A regularized approach to feature selection for face detection
    Destrero, Augusto
    De Mol, Christine
    Odone, Francesca
    Verri, Alessandro
    COMPUTER VISION - ACCV 2007, PT II, PROCEEDINGS, 2007, 4844 : 881 - +
  • [29] Unsupervised feature selection by regularized matrix factorization
    Qi, Miao
    Wang, Ting
    Liu, Fucong
    Zhang, Baoxue
    Wang, Jianzhong
    Yi, Yugen
    NEUROCOMPUTING, 2018, 273 : 593 - 610
  • [30] Process PLS: Incorporating substantive knowledge into the predictive modelling of multiblock, multistep, multidimensional and multicollinear process data
    van Kollenburg, Geert
    Bouman, Roel
    Offermans, Tim
    Gerretzen, Jan
    Buydens, Lutgarde
    van Manen, Henk-Jan
    Jansen, Jeroen
    COMPUTERS & CHEMICAL ENGINEERING, 2021, 154