Linear regression for numeric symbolic variables: a least squares approach based on Wasserstein Distance

被引:0
|
作者
Antonio Irpino
Rosanna Verde
机构
[1] Second University of Naples,Department of Political Sciences “J. Monnet”
关键词
Modal symbolic variables; Probability distribution function; Histogram data; Regression; Wasserstein distance; 62J05; 62G30; 46F10;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper we present a new linear regression technique for distributional symbolic variables, i.e., variables whose realizations can be histograms, empirical distributions or empirical estimates of parametric distributions. Such data are known as numerical modal data according to the Symbolic Data Analysis definitions. In order to measure the error between the observed and the predicted distributions, the ℓ2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _2$$\end{document} Wasserstein distance is proposed. Some properties of such a metric are exploited to predict the modal response variable as a linear combination of the explanatory modal variables. Based on the metric, the model uses the quantile functions associated with the data and thus is subject to a positivity constraint of the estimated parameters. We propose solving the linear regression problem by starting from a particular decomposition of the squared distance. Therefore, we estimate the model parameters according to two separate models, one for the averages of the data and one for the centered distributions by a constrained least squares algorithm. Measures of goodness-of-fit are also proposed and discussed. The method is validated by two applications, one on simulated data and one on two real-world datasets.
引用
收藏
页码:81 / 106
页数:25
相关论文
共 50 条
  • [41] Fuzzy Regression Models Using the Least-Squares Method Based on the Concept of Distance
    Chen, Liang-Hsuan
    Hsueh, Chan-Ching
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2009, 17 (06) : 1259 - 1272
  • [42] Local Linear Least Squares Kernel Regression for Linear and Circular Predictors
    Qin, Xu
    Zhang, Jiang-She
    Yan, Xiao-Dong
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2011, 40 (21) : 3812 - 3823
  • [43] Study of Linear Regression Based on Least Squares and Fuzzy Least Absolutes Deviations and its Application in Geography
    Dehghan, Mohammad Hossein
    Hamidi, Farhad
    Salajegheh, Mahsa
    2015 4th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS), 2015,
  • [44] An efficient approach to the linear least squares problem
    Tunyan, K
    Egiazarian, K
    Tuniev, A
    Astola, J
    SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2004, 26 (02) : 583 - 598
  • [45] Orthogonal Distance Least Squares Fitting: A Novel Approach
    Wijewickrema, Sudanthi
    Esson, Charles
    Paplinski, Andrew
    COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS: THEORY AND APPLICATIONS, 2010, 68 : 255 - +
  • [46] A weighted least-squares approach to clusterwise regression
    Rainer Schlittgen
    AStA Advances in Statistical Analysis, 2011, 95 : 205 - 217
  • [47] Significance regression: A statistical approach to partial least squares
    Control and Dynamical Systems 210-41, California Institute of Technology, Pasadena, CA 91125, United States
    不详
    不详
    Journal of Chemometrics, 11 (04): : 283 - 309
  • [48] Significance regression: A statistical approach to partial least squares
    Holcomb, TR
    Hjalmarsson, H
    Morari, M
    Tyler, ML
    JOURNAL OF CHEMOMETRICS, 1997, 11 (04) : 283 - 309
  • [49] Analysis of least squares regression estimates in case of additional errors in the variables
    Fromkorth, Andreas
    Kohler, Michael
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2011, 141 (01) : 172 - 188
  • [50] A weighted least-squares approach to clusterwise regression
    Schlittgen, Rainer
    ASTA-ADVANCES IN STATISTICAL ANALYSIS, 2011, 95 (02) : 205 - 217