Regularized Feature Selection in Categorical PLS for Multicollinear Data

被引:1
|
作者
Mehmood, Tahir [1 ]
机构
[1] NUST, SNS, Islamabad, Pakistan
关键词
PARTIAL LEAST-SQUARES; BALOCHISTAN;
D O I
10.1155/2021/5561752
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Article presents the algorithm which models the categorical multicollinear data by providing the balance in model accuracy on test data and number of selected features in the model. In all scientific fields, multicollinear data is being generated, where obviously some variables are noise and some are influential reference to response variable. Features and response appeared to be categorical in mathematical and statistical modeling of public health data. These datasets usually appeared to collinear, where partial least squares (PLS) is the potential method, which is not feature selection at its default level and deals with quantitative features. Recently, categorical PLS (Cat-PLS) is introduced. We have implemented the regularized feature selection in Cat-PLS where filter-based feature selection and categorical mean through Cramer's V, Phi coefficient, Tschuprow's T coefficient, Contingency Coefficient, and Yule's Q and Yule's Y are used. Monte carlo simulation with 100 runs indicates Cramer V*VIP is the better choice in terms of better model performance, number of feature selection, and interpretations for modeling the stillbirths, which is taken as the case study. The framework can be used in related areas to explore and model the related data structures.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] On the Interpretability of Machine Learning Models and Experimental Feature Selection in Case of Multicollinear Data
    Drobnic, Franc
    Kos, Andrej
    Pustisek, Matevz
    ELECTRONICS, 2020, 9 (05)
  • [2] Coupling learning for feature selection in categorical data
    Feng Wang
    Jiye Liang
    Peng Song
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 2455 - 2465
  • [3] Coupling learning for feature selection in categorical data
    Wang, Feng
    Liang, Jiye
    Song, Peng
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (07) : 2455 - 2465
  • [4] Graph Regularized Feature Selection with Data Reconstruction
    Zhao, Zhou
    He, Xiaofei
    Cai, Deng
    Zhang, Lijun
    Ng, Wilfred
    Zhuang, Yueting
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (03) : 689 - 700
  • [5] A nominal association matrix with feature selection for categorical data
    Huang, Wenxue
    Shi, Yong
    Wang, Xiaogang
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2017, 46 (16) : 7798 - 7819
  • [6] Parameterized Complexity of Feature Selection for Categorical Data Clustering
    Bandyapadhyay, Sayan
    Fomin, Fedor V.
    Golovach, Petr A.
    Simonov, Kirill
    ACM TRANSACTIONS ON COMPUTATION THEORY, 2023, 15 (3-4)
  • [7] Clustered Variable Selection by Regularized Elimination in PLS
    Mehmood, Tahir
    Snipen, Lars
    NEW PERSPECTIVES IN PARTIAL LEAST SQUARES AND RELATED METHODS, 2013, 56 : 95 - 105
  • [8] Feature selection for functional PLS
    Kondylis, Athanassios
    Whittaker, Joe
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2013, 121 : 82 - 89
  • [9] AN HYBRID APPROACH TO FEATURE SELECTION FOR MIXED CATEGORICAL AND CONTINUOUS DATA
    Doquire, Gauthier
    Verleysen, Michel
    KDIR 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL, 2011, : 394 - 401
  • [10] Feature selection for clustering categorical data with an embedded modelling approach
    Silvestre, Claudia
    Cardoso, Margarida G. M. S.
    Figueiredo, Mario
    EXPERT SYSTEMS, 2015, 32 (03) : 444 - 453