A clustering-based feature selection method for automatically generated relational attributes

被引:6
|
作者
Rezaei, Mostafa [1 ]
Cribben, Ivor [2 ]
Samorani, Michele [3 ]
机构
[1] Univ Alberta, Alberta Sch Business, Operat & Informat Syst, Edmonton, AB T6G 2R6, Canada
[2] Univ Alberta, Alberta Sch Business, Finance & Stat Anal, Edmonton, AB T6G 2R6, Canada
[3] Santa Clara Univ, Leavey Sch Business, Informat Syst & Analyt, Santa Clara, CA 95053 USA
关键词
Relational attribute generation; Feature selection; Lasso; Elastic net; Clustering; VARIABLE SELECTION; REGRESSION SHRINKAGE; PRODUCT RETURNS; REGULARIZATION;
D O I
10.1007/s10479-018-2830-2
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
Although data mining problems require a flat mining table as input, in many real-world applications analysts are interested in finding patterns in a relational database. To this end, new methods and software have been recently developed that automatically add attributes (or features) to a target table of a relational database which summarize information from all other tables. When attributes are automatically constructed by these methods, selecting the important attributes is particularly difficult, because a large number of the attributes are highly correlated. In this setting, attribute selection techniques such as the Least Absolute Shrinkage and Selection Operator (lasso), elastic net, and other machine learning methods tend to under-perform. In this paper, we introduce a novel attribute selection procedure, where after an initial screening step, we cluster the attributes into different groups and apply the group lasso to select both the true attributes groups and then the true attributes. The procedure is particularly suited to high dimensional data sets where the attributes are highly correlated. We test our procedure on several simulated data sets and a real-world data set from a marketing database. The results show that our proposed procedure obtains a higher predictive performance while selecting a much smaller set of attributes when compared to other state-of-the-art methods.
引用
收藏
页码:233 / 263
页数:31
相关论文
共 50 条
  • [1] A clustering-based feature selection method for automatically generated relational attributes
    Mostafa Rezaei
    Ivor Cribben
    Michele Samorani
    [J]. Annals of Operations Research, 2021, 303 : 233 - 263
  • [2] A clustering-based feature selection via feature separability
    Jiang, Shengyi
    Wang, Lianxi
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2016, 31 (02) : 927 - 937
  • [3] Clustering-based feature selection for verb sense disambiguation
    Chen, JY
    Palmer, M
    [J]. Proceedings of the 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE'05), 2005, : 36 - 41
  • [4] Clustering-based Feature Selection for Internet Attack Defense
    Seo, Jungtaek
    Kim, Jungtae
    Moon, Jongsub
    Kang, Boo Jung
    Im, Eul Gyu
    [J]. INTERNATIONAL JOURNAL OF FUTURE GENERATION COMMUNICATION AND NETWORKING, 2008, 1 (01): : 91 - 98
  • [5] Feature Selection and Overlapping Clustering-Based Multilabel Classification Model
    Peng, Liwen
    Liu, Yongguo
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2018, 2018
  • [6] An Experimental Study on Unsupervised Clustering-based Feature Selection Methods
    Covoes, Thiago F.
    Hruschka, Eduardo R.
    [J]. 2009 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2009, : 993 - 1000
  • [7] Clustering-based Feature Selection in Semi-supervised Problems
    Quinzan, Ianisse
    Sotoca, Jose M.
    Pla, Filiberto
    [J]. 2009 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2009, : 535 - 540
  • [8] A New Clustering-Based Method for Protein Structure Selection
    Wang, Qingguo
    Shang, Yi
    Xu, Dong
    [J]. 2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 2891 - 2898
  • [9] Clustering-Based Feature Selection for Content Based Remote Sensing Image Retrieval
    Li, Shijin
    Zhu, Jiali
    Feng, Jun
    Wan, Dingsheng
    [J]. IMAGE ANALYSIS AND RECOGNITION, PT I, 2012, 7324 : 427 - 435
  • [10] Spectral Clustering-based Local and Global Structure Preservation for Feature Selection
    Zhou, Sihang
    Liu, Xinwang
    Zhu, Chengzhang
    Liu, Qiang
    Yin, Jianping
    [J]. PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 550 - 557