Regularized Feature Selection in Categorical PLS for Multicollinear Data

被引:1
|
作者
Mehmood, Tahir [1 ]
机构
[1] NUST, SNS, Islamabad, Pakistan
关键词
PARTIAL LEAST-SQUARES; BALOCHISTAN;
D O I
10.1155/2021/5561752
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Article presents the algorithm which models the categorical multicollinear data by providing the balance in model accuracy on test data and number of selected features in the model. In all scientific fields, multicollinear data is being generated, where obviously some variables are noise and some are influential reference to response variable. Features and response appeared to be categorical in mathematical and statistical modeling of public health data. These datasets usually appeared to collinear, where partial least squares (PLS) is the potential method, which is not feature selection at its default level and deals with quantitative features. Recently, categorical PLS (Cat-PLS) is introduced. We have implemented the regularized feature selection in Cat-PLS where filter-based feature selection and categorical mean through Cramer's V, Phi coefficient, Tschuprow's T coefficient, Contingency Coefficient, and Yule's Q and Yule's Y are used. Monte carlo simulation with 100 runs indicates Cramer V*VIP is the better choice in terms of better model performance, number of feature selection, and interpretations for modeling the stillbirths, which is taken as the case study. The framework can be used in related areas to explore and model the related data structures.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] Feature Selection for Adaptive Dual-Graph Regularized Concept Factorization for Data Representation
    Ye, Jun
    Jin, Zhong
    NEURAL PROCESSING LETTERS, 2017, 45 (02) : 667 - 688
  • [32] A Ll-regularized feature selection method for local dimension reduction on microarray data
    Guo, Shun
    Guo, Donghui
    Chen, Lifei
    Jiang, Qingshan
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2017, 67 : 92 - 101
  • [33] Feature Selection for Adaptive Dual-Graph Regularized Concept Factorization for Data Representation
    Jun Ye
    Zhong Jin
    Neural Processing Letters, 2017, 45 : 667 - 688
  • [34] Dual graph regularized compact feature representation for unsupervised feature selection
    Li, Shaoyong
    Tang, Chang
    Liu, Xinwang
    Liu, Yaping
    Chen, Jiajia
    NEUROCOMPUTING, 2019, 331 : 77 - 96
  • [35] Genomic Selection in Chinese Holsteins Using Regularized Regression Models for Feature Selection of Whole Genome Sequencing Data
    Li, Shanshan
    Yu, Jian
    Kang, Huimin
    Liu, Jianfeng
    ANIMALS, 2022, 12 (18):
  • [36] A Mutual Information Based on Ant Colony Optimization Method to Feature Selection for Categorical Data Clustering
    Shojaee, Z.
    Fazeli, S. A. Shahzadeh
    Abbasi, E.
    Adibnia, F.
    Masuli, F.
    Rovetta, S.
    IRANIAN JOURNAL OF SCIENCE, 2023, 47 (01) : 175 - 186
  • [37] Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature Selection
    Chen, Hui
    Xu, Kunpeng
    Chen, Lifei
    Jiang, Qingshan
    MATHEMATICS, 2021, 9 (14)
  • [38] A Mutual Information Based on Ant Colony Optimization Method to Feature Selection for Categorical Data Clustering
    Z. Shojaee
    S. A. Shahzadeh Fazeli
    E. Abbasi
    F. Adibnia
    F. Masuli
    S. Rovetta
    Iranian Journal of Science, 2023, 47 : 175 - 186
  • [39] Robust inner product regularized unsupervised feature selection
    Qian, Youcheng
    Yin, Xueyan
    Gao, Wei
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (23) : 33593 - 33615
  • [40] Unsupervised feature selection by regularized self-representation
    Zhu, Pengfei
    Zuo, Wangmeng
    Zhang, Lei
    Hu, Qinghua
    Shiu, Simon C. K.
    PATTERN RECOGNITION, 2015, 48 (02) : 438 - 446