Multiple Bayesian discriminant functions for high-dimensional massive data classification

被引:0
|
作者
Jianfei Zhang
Shengrui Wang
Lifei Chen
Patrick Gallinari
机构
[1] Université de Sherbrooke,ProspectUS Laboratoire, Département d’Informatique
[2] Fujian Normal University,School of Mathematics and Computer Science
[3] Université Pierre et Marie Curie,Laboratoire d’Informatique de Paris 6 (LIP6)
来源
关键词
Decision boundaries; Naive Bayes; Feature weighting; High-dimensional massive data; Class dispersion;
D O I
暂无
中图分类号
学科分类号
摘要
The presence of complex distributions of samples concealed in high-dimensional, massive sample-size data challenges all of the current classification methods for data mining. Samples within a class usually do not uniformly fill a certain (sub)space but are individually concentrated in certain regions of diverse feature subspaces, revealing the class dispersion. Current classifiers applied to such complex data inherently suffer from either high complexity or weak classification ability, due to the imbalance between flexibility and generalization ability of the discriminant functions used by these classifiers. To address this concern, we propose a novel representation of discriminant functions in Bayesian inference, which allows multiple Bayesian decision boundaries per class, each in its individual subspace. For this purpose, we design a learning algorithm that incorporates the naive Bayes and feature weighting approaches into structural risk minimization to learn multiple Bayesian discriminant functions for each class, thus combining the simplicity and effectiveness of naive Bayes and the benefits of feature weighting in handling high-dimensional data. The proposed learning scheme affords a recursive algorithm for exploring class density distribution for Bayesian estimation, and an automated approach for selecting powerful discriminant functions while keeping the complexity of the classifier low. Experimental results on real-world data characterized by millions of samples and features demonstrate the promising performance of our approach.
引用
收藏
页码:465 / 501
页数:36
相关论文
共 50 条
  • [21] Discriminant analysis of high-dimensional data over limited samples
    Serdobolskii, V. I.
    DOKLADY MATHEMATICS, 2010, 81 (01) : 75 - 77
  • [22] Generalized Linear Discriminant Analysis for High-Dimensional Genomic Data
    Li, Sisi
    Lewinger, Juan Pablo
    GENETIC EPIDEMIOLOGY, 2017, 41 (07) : 704 - 704
  • [23] Discriminant analysis of high-dimensional data over limited samples
    V. I. Serdobolskii
    Doklady Mathematics, 2010, 81 : 75 - 77
  • [24] Generalized linear discriminant analysis for high-dimensional genomic data
    Li, Sisi
    Lewinger, Juan Pablo
    GENETIC EPIDEMIOLOGY, 2018, 42 (07) : 713 - 713
  • [25] Optimal Linear Discriminant Analysis for High-Dimensional Functional Data
    Xue, Kaijie
    Yang, Jin
    Yao, Fang
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (546) : 1055 - 1064
  • [26] High-dimensional integrative copula discriminant analysis for multiomics data
    He, Yong
    Chen, Hao
    Sun, Hao
    Ji, Jiadong
    Shi, Yufeng
    Zhang, Xinsheng
    Liu, Lei
    STATISTICS IN MEDICINE, 2020, 39 (30) : 4869 - 4884
  • [27] Diagonal Discriminant Analysis With Feature Selection for High-Dimensional Data
    Romanes, Sarah E.
    Ormerod, John T.
    Yang, Jean Y. H.
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2020, 29 (01) : 114 - 127
  • [28] Weighted linear programming discriminant analysis for high-dimensional binary classification
    Wu, Yufei
    Yu, Guan
    STATISTICAL ANALYSIS AND DATA MINING, 2020, 13 (05) : 437 - 450
  • [29] Bias-Corrected Diagonal Discriminant Rules for High-Dimensional Classification
    Huang, Song
    Tong, Tiejun
    Zhao, Hongyu
    BIOMETRICS, 2010, 66 (04) : 1096 - 1106
  • [30] Classification methods for high-dimensional genetic data
    Kalina, Jan
    BIOCYBERNETICS AND BIOMEDICAL ENGINEERING, 2014, 34 (01) : 10 - 18