Parametrized linear regression for boxplot-multivalued data applied to the Brazilian Electric Sector

被引:4
|
作者
Reyes, Dailys M. A. [1 ]
Souza, Leandro C. [2 ]
de Souza, Renata M. C. R. [1 ]
de Oliveira, Adriano L. I. [1 ]
机构
[1] Univ Fed Pernambuco, Ctr Informat, Jornalista Anibal Fernandes S-N, Cidade Univ, BR-50740560 Recife, PE, Brazil
[2] Univ Fed Paraiba, Ctr Informat, R Escoteiros S-N, BR-58055000 Joao Pessoa, PB, Brazil
关键词
Symbolic data analysis; Boxplot data; Linear regression; Quantile functions; MODEL; STATISTICS;
D O I
10.1016/j.ins.2023.119758
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Symbolic boxplot data can be considered as a particular case of the numerical multi-valued variable. This kind of symbolic data is an useful exploratory tool with a simple structure for summarizing groups of numerical data. However, in the literature of symbolic data analysis it has been little explored. In this paper, we propose a new prediction method for extracting knowledge from boxplot data. A parametrized regression approach automatically extracts the best reference points from the regressor variables. These reference points are then used to build five linear regression models based on values of the boxplot: minimum (m), lower quartile (Q1), median (Q2), upper quartile (Q3) and maximum (M). A strategy based on BoxCox transformation is applied to the response variable in order to guarantee the mathematical coherence of the predictions and build the boxplot. Experimental evaluation with synthetic and real boxplot datasets illustrates the advantages of the proposed method. Moreover, the present work also focuses in the development of an application for predicting temperature data based on boxplot in the Brazilian Electric Sector.
引用
收藏
页数:16
相关论文
共 17 条
  • [1] A parametrized approach for linear-regression of interval data
    Souza, Leandro C.
    Souza, Renata M. C. R.
    Amaral, Getulio J. A.
    Silva Filho, Telmo M.
    [J]. KNOWLEDGE-BASED SYSTEMS, 2017, 131 : 149 - 159
  • [2] Statistical Method for Finding Outliers in Multivariate Data using a Boxplot and Multiple Linear Regression
    Thanwiset, Theeraphat
    Srisodaphol, Wuttichai
    [J]. SAINS MALAYSIANA, 2023, 52 (09): : 2725 - 2732
  • [3] Applied Linear Regression for Longitudinal Data: With an Emphasis on Missing Observations
    Marino, Maria Francesca
    Tan, Frans E. S.
    Jolani, Shahab
    [J]. AMERICAN STATISTICIAN, 2024, 78 (01):
  • [4] DATA POTENTIAL OF THE BRAZILIAN CENSUS SECTOR APPLIED TO THE MARKETING OF A FAST FOOD DELIVERY
    Prochnow, Ronan Max
    Oliveira, Francisco Henrique
    Oliveira, Rubens A.
    [J]. REVISTA GEOGRAFICA DE AMERICA CENTRAL, 2011, (47E):
  • [5] A Novel Remaining Useful Estimation Model to Assist Asset Renewal Decisions Applied to the Brazilian Electric Sector
    Santiago, Hemir da Cunha
    Cavalcanti, Jose Carlos da Silva
    Prudencio, Ricardo Bastos Cavalcante
    Mohamed, Mohamed A.
    Sarubbo, Leonie Asfora
    Converti, Attilio
    Marinho, Manoel Henrique da Nobrega
    [J]. ENERGIES, 2023, 16 (06)
  • [6] New Partially Linear Regression and Machine Learning Models Applied to Agronomic Data
    Rodrigues, Gabriela M.
    Ortega, Edwin M. M.
    Cordeiro, Gauss M.
    [J]. AXIOMS, 2023, 12 (11)
  • [8] OPTIMUM LINEAR-REGRESSION AND ERROR ESTIMATION APPLIED TO U-PB DATA
    DAVIS, DW
    [J]. CANADIAN JOURNAL OF EARTH SCIENCES, 1982, 19 (11) : 2141 - 2149
  • [9] SIMPLE PROGRAM FOR WEIGHTED LINEAR REGRESSION APPLIED TO ACTIVATION-ANALYSIS DATA-PROCESSING
    TACZANOWSKI, S
    [J]. JOURNAL OF RADIOANALYTICAL CHEMISTRY, 1973, 13 (02): : 475 - 482
  • [10] Multivariate linear regression with variable selection by a successive projections algorithm applied to the analysis of anodic stripping voltammetry data
    Marreto, Paola D.
    Zimer, Alexsandro M.
    Faria, Ronaldo C.
    Mascaro, Lucia H.
    Pereira, Ernesto C.
    Fragoso, Wallace D.
    Lemos, Sherlan G.
    [J]. ELECTROCHIMICA ACTA, 2014, 127 : 68 - 78