Handling high-dimensional data in air pollution forecasting tasks

被引:8
|
作者
Domanska, Diana [1 ,4 ]
Lukasik, Szymon [2 ,3 ]
机构
[1] Univ Silesia, Inst Comp Sci, Ul Bedzinska 39, PL-41200 Sosnowiec, Poland
[2] Polish Acad Sci, Syst Res Inst, Ul Newelska 6, PL-01447 Warsaw, Poland
[3] AGH Univ Sci & Technol, Fac Phys & Appl Comp Sci, Al Mickiewicza 30, PL-30059 Krakow, Poland
[4] Univ Oslo, Dept Informat, POB 1072, N-0316 Oslo, Norway
关键词
Big data; Multidimensional data; Dimensionality reduction; Fractional distances; Forecasting; Pollution; PRINCIPAL COMPONENT; FEATURE-SELECTION; REDUCTION; MODEL; PREDICTION; ALGORITHM; INDEX; PM10;
D O I
10.1016/j.ecoinf.2016.04.007
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
In the paper methods aimed at handling high-dimensional weather forecasts data used to predict the concentrations of PM10, PM2.5, SO2, NO, CO and O-3 are being proposed. The procedure employed to predict pollution normally requires historical data samples for a large number of points in time particularly weather forecast data, actual weather data and pollution data. Likewise, it typically involves using numerous features related to atmospheric conditions. Consequently the analysis of such datasets to generate accurate forecasts becomes very cumbersome task. The paper examines a variety of unsupervised dimensionality reduction methods aimed at obtaining compact yet informative set of features. As an alternative, approach using fractional distances for data analysis tasks is being considered as well. Both strategies were evaluated on real-world data obtained from the Institute of Meteorology and Water Management in Katowice (Poland), with extended Air Pollution Forecast Model (e-APFM) being used as underlying prediction tool. It was found that employing fractional distance as a dissimilarity measure ensures the best accuracy of forecasting. Satisfactory results can be also obtained with Isomap, Landmark Isomap and Factor Analysis as dimensionality reduction techniques. These methods can be also used to formulate universal mapping, ready-to-use for data gathered at different geographical areas. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:70 / 91
页数:22
相关论文
共 50 条
  • [1] Recession forecasting with high-dimensional data
    Nevasalmi, Lauri
    [J]. JOURNAL OF FORECASTING, 2022, 41 (04) : 752 - 764
  • [2] Spatial matrix completion for spatially misaligned and high-dimensional air pollution data
    Vu, Phuong T.
    Szpiro, Adam A.
    Simon, Noah
    [J]. ENVIRONMETRICS, 2022, 33 (04)
  • [3] High-dimensional Density Estimation for Data Mining Tasks
    Kuleshov, Alexander
    Bernstein, Alexander
    Yanovich, Yury
    [J]. 2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2017), 2017, : 523 - 530
  • [4] Robust factor modelling for high-dimensional time series: An application to air pollution data
    Reisen, Valderio Anselmo
    Sgrancio, Adriano Marcio
    Levy-Leduc, Celine
    Bondon, Pascal
    Monte, Edson Zambon
    Aranda Cotta, Higor Henrique
    Ziegelmann, Flavio Augusto
    [J]. APPLIED MATHEMATICS AND COMPUTATION, 2019, 346 : 842 - 852
  • [5] Forecasting the Japanese macroeconomy using high-dimensional data
    Yoshiki Nakajima
    Naoya Sueishi
    [J]. The Japanese Economic Review, 2022, 73 : 299 - 324
  • [6] Forecasting the Japanese macroeconomy using high-dimensional data
    Nakajima, Yoshiki
    Sueishi, Naoya
    [J]. JAPANESE ECONOMIC REVIEW, 2022, 73 (02) : 299 - 324
  • [7] Multivariate Feature Ranking With High-Dimensional Data for Classification Tasks
    Jimenez, Fernando
    Sanchez, Gracia
    Palma, Jose
    Miralles-Pechuan, Luis
    Botia, Juan A.
    [J]. IEEE ACCESS, 2022, 10 : 60421 - 60437
  • [8] Volatility forecasting from multiscale and high-dimensional market data
    Gavrishchaka, VV
    Ganguli, SB
    [J]. NEUROCOMPUTING, 2003, 55 (1-2) : 285 - 305
  • [9] Incorporating High-Dimensional Exposure Modelling into Studies of Air Pollution and Health
    Liu Y.
    Shaddick G.
    Zidek J.V.
    [J]. Statistics in Biosciences, 2017, 9 (2) : 559 - 581
  • [10] Inferring biological tasks using Pareto analysis of high-dimensional data
    Hart Y.
    Sheftel H.
    Hausser J.
    Szekely P.
    Ben-Moshe N.B.
    Korem Y.
    Tendler A.
    Mayo A.E.
    Alon U.
    [J]. Nature Methods, 2015, 12 (3) : 233 - 235