Classification of histogram-valued data with support histogram machines

被引:1
|
作者
Kang, Ilsuk [1 ]
Park, Cheolwoo [2 ]
Yoon, Young Joo [3 ]
Park, Changyi [4 ]
Kwon, Soon-Sun [5 ]
Choi, Hosik [6 ]
机构
[1] Univ Georgia, Dept Stat, Athens, GA 30602 USA
[2] Korea Adv Inst Sci & Technol, Dept Math Sci, Daejeon, South Korea
[3] Korea Natl Univ Educ, Dept Math Educ, Cheongju, South Korea
[4] Univ Seoul, Dept Stat, Seoul, South Korea
[5] Ajou Univ, Dept Math, Suwon, South Korea
[6] Univ Seoul, Grad Sch, Dept Urban Big Data Convergence, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
Support vector machines; symbolic data; Wasserstein-Kantorovich metric; VECTOR MACHINES; DISSIMILARITY MEASURES; REGULARIZATION; REGRESSION;
D O I
10.1080/02664763.2021.1947996
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The current large amounts of data and advanced technologies have produced new types of complex data, such as histogram-valued data. The paper focuses on classification problems when predictors are observed as or aggregated into histograms. Because conventional classification methods take vectors as input, a natural approach converts histograms into vector-valued data using summary values, such as the mean or median. However, this approach forgoes the distributional information available in histograms. To address this issue, we propose a margin-based classifier called support histogram machine (SHM) for histogram-valued data. We adopt the support vector machine framework and the Wasserstein-Kantorovich metric to measure distances between histograms. The proposed optimization problem is solved by a dual approach. We then test the proposed SHM via simulated and real examples and demonstrate its superior performance to summary-value-based methods.
引用
收藏
页码:675 / 690
页数:16
相关论文
共 50 条
  • [1] Copulas and Histogram-Valued Data
    Jin, Honghe
    Billard, Lynne
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2023, 32 (02) : 642 - 657
  • [2] Convex clustering analysis for histogram-valued data
    Park, Cheolwoo
    Choi, Hosik
    Delcher, Chris
    Wang, Yanning
    Yoon, Young Joo
    BIOMETRICS, 2019, 75 (02) : 603 - 612
  • [3] Principal component analysis for histogram-valued data
    J. Le-Rademacher
    L. Billard
    Advances in Data Analysis and Classification, 2017, 11 : 327 - 351
  • [4] Double monothetic clustering for histogram-valued data
    Kim, Jaejik
    Billard, L.
    COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS, 2018, 25 (03) : 263 - 274
  • [5] Principal component analysis for histogram-valued data
    Le-Rademacher, J.
    Billard, L.
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2017, 11 (02) : 327 - 351
  • [6] Dissimilarity Measures for Histogram-valued Observations
    Kim, Jaejik
    Billard, L.
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2013, 42 (02) : 283 - 303
  • [7] The Lookup Table Regression Model for Histogram-Valued Symbolic Data
    Ichino, Manabu
    STATS, 2022, 5 (04): : 1271 - 1293
  • [8] Linear Regression Model with Histogram-Valued Variables
    Dias, Sonia
    Brito, Paula
    STATISTICAL ANALYSIS AND DATA MINING, 2015, 8 (02) : 75 - 113
  • [9] Composite likelihood methods for histogram-valued random variables
    Whitaker, T.
    Beranger, B.
    Sisson, S. A.
    STATISTICS AND COMPUTING, 2020, 30 (05) : 1459 - 1477
  • [10] Composite likelihood methods for histogram-valued random variables
    T. Whitaker
    B. Beranger
    S. A. Sisson
    Statistics and Computing, 2020, 30 : 1459 - 1477