Multiple Bayesian discriminant functions for high-dimensional massive data classification

被引:0
|
作者
Jianfei Zhang
Shengrui Wang
Lifei Chen
Patrick Gallinari
机构
[1] Université de Sherbrooke,ProspectUS Laboratoire, Département d’Informatique
[2] Fujian Normal University,School of Mathematics and Computer Science
[3] Université Pierre et Marie Curie,Laboratoire d’Informatique de Paris 6 (LIP6)
来源
关键词
Decision boundaries; Naive Bayes; Feature weighting; High-dimensional massive data; Class dispersion;
D O I
暂无
中图分类号
学科分类号
摘要
The presence of complex distributions of samples concealed in high-dimensional, massive sample-size data challenges all of the current classification methods for data mining. Samples within a class usually do not uniformly fill a certain (sub)space but are individually concentrated in certain regions of diverse feature subspaces, revealing the class dispersion. Current classifiers applied to such complex data inherently suffer from either high complexity or weak classification ability, due to the imbalance between flexibility and generalization ability of the discriminant functions used by these classifiers. To address this concern, we propose a novel representation of discriminant functions in Bayesian inference, which allows multiple Bayesian decision boundaries per class, each in its individual subspace. For this purpose, we design a learning algorithm that incorporates the naive Bayes and feature weighting approaches into structural risk minimization to learn multiple Bayesian discriminant functions for each class, thus combining the simplicity and effectiveness of naive Bayes and the benefits of feature weighting in handling high-dimensional data. The proposed learning scheme affords a recursive algorithm for exploring class density distribution for Bayesian estimation, and an automated approach for selecting powerful discriminant functions while keeping the complexity of the classifier low. Experimental results on real-world data characterized by millions of samples and features demonstrate the promising performance of our approach.
引用
收藏
页码:465 / 501
页数:36
相关论文
共 50 条
  • [11] A review of quadratic discriminant analysis for high-dimensional data
    Qin, Yingli
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2018, 10 (04)
  • [12] Interpolating discriminant functions in high-dimensional Gaussian latent mixtures
    Bing, Xin
    Wegkamp, Marten
    BIOMETRIKA, 2024, 111 (01) : 291 - 308
  • [13] A classification algorithm for high-dimensional data
    Roy, Asim
    INNS CONFERENCE ON BIG DATA 2015 PROGRAM, 2015, 53 : 345 - 355
  • [14] Multiple Feature Construction in Classification on High-Dimensional Data Using GP
    Binh Tran
    Zhang, Mengjie
    Xue, Bing
    PROCEEDINGS OF 2016 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2016,
  • [15] Bayesian Conditional Tensor Factorizations for High-Dimensional Classification
    Yang, Yun
    Dunson, David B.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2016, 111 (514) : 656 - 669
  • [16] High-dimensional discriminant analysis
    Bouveyron, Charles
    Girard, Stephane
    Schmid, Cordelia
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2007, 36 (13-16) : 2607 - 2623
  • [17] New Fitness Functions in Genetic Programming for Classification with High-dimensional Unbalanced Data
    Pei, Wenbin
    Xue, Bing
    Shang, Lin
    Zhang, Mengjie
    2019 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2019, : 2779 - 2786
  • [18] Multiple testing for high-dimensional data
    Diao, Guoqing
    Hanlon, Bret
    Vidyashankar, Anand N.
    PERSPECTIVES ON BIG DATA ANALYSIS: METHODOLOGIES AND APPLICATIONS, 2014, 622 : 95 - 108
  • [19] Sparse bayesian kernel multinomial probit regression model for high-dimensional data classification
    Yang, Aijun
    Jiang, Xuejun
    Shu, Lianjie
    Liu, Pengfei
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2019, 48 (01) : 165 - 176
  • [20] Feature extraction and uncorrelated discriminant analysis for high-dimensional data
    Yang, Wen-Hui
    Dai, Dao-Qing
    Yan, Hong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (05) : 601 - 614