Linear regression for numeric symbolic variables: a least squares approach based on Wasserstein Distance

被引:0
|
作者
Antonio Irpino
Rosanna Verde
机构
[1] Second University of Naples,Department of Political Sciences “J. Monnet”
关键词
Modal symbolic variables; Probability distribution function; Histogram data; Regression; Wasserstein distance; 62J05; 62G30; 46F10;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper we present a new linear regression technique for distributional symbolic variables, i.e., variables whose realizations can be histograms, empirical distributions or empirical estimates of parametric distributions. Such data are known as numerical modal data according to the Symbolic Data Analysis definitions. In order to measure the error between the observed and the predicted distributions, the ℓ2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _2$$\end{document} Wasserstein distance is proposed. Some properties of such a metric are exploited to predict the modal response variable as a linear combination of the explanatory modal variables. Based on the metric, the model uses the quantile functions associated with the data and thus is subject to a positivity constraint of the estimated parameters. We propose solving the linear regression problem by starting from a particular decomposition of the squared distance. Therefore, we estimate the model parameters according to two separate models, one for the averages of the data and one for the centered distributions by a constrained least squares algorithm. Measures of goodness-of-fit are also proposed and discussed. The method is validated by two applications, one on simulated data and one on two real-world datasets.
引用
收藏
页码:81 / 106
页数:25
相关论文
共 50 条
  • [1] Linear regression for numeric symbolic variables: a least squares approach based on Wasserstein Distance
    Irpino, Antonio
    Verde, Rosanna
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2015, 9 (01) : 81 - 106
  • [2] Ordinary Least Squares for Histogram Data Based on Wasserstein Distance
    Verde, Rosanna
    Irpino, Antonio
    COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, : 581 - 588
  • [3] LINEAR LEAST SQUARES REGRESSION
    WATSON, GS
    ANNALS OF MATHEMATICAL STATISTICS, 1967, 38 (06): : 1679 - &
  • [4] A least-squares approach to fuzzy linear regression analysis
    D'Urso, P
    Gastaldi, T
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2000, 34 (04) : 427 - 440
  • [5] Consistency of the total least squares estimator in the linear errors-in-variables regression
    Shklyar, Sergiy
    MODERN STOCHASTICS-THEORY AND APPLICATIONS, 2018, 5 (03): : 247 - 295
  • [6] Group least squares regression for linear models with strongly correlated predictor variables
    Min Tsao
    Annals of the Institute of Statistical Mathematics, 2023, 75 : 233 - 250
  • [7] ROBUST LINEAR LEAST SQUARES REGRESSION
    Audibert, Jean-Yves
    Catoni, Olivier
    ANNALS OF STATISTICS, 2011, 39 (05): : 2766 - 2794
  • [8] Linear least-squares regression
    Young, Sidney H.
    Wierzbicki, Andrzej
    Journal of Chemical Education, 2000, 77 (05)
  • [9] AGGREGATION OF VARIABLES IN LEAST-SQUARES REGRESSION
    LICHTENBERG, FR
    AMERICAN STATISTICIAN, 1990, 44 (02): : 169 - 171
  • [10] Nonlinear Least Squares Optimization of Constants in Symbolic Regression
    Kommenda, Michael
    Affenzeller, Michael
    Kronberger, Gabriel
    Winkler, Stephan M.
    COMPUTER AIDED SYSTEMS THEORY, PT 1, 2013, 8111 : 420 - 427