CGAN-based synthetic multivariate time-series generation: a solution to data scarcity in solar flare forecasting

被引:5
|
作者
Chen, Yang [1 ]
Kempton, Dustin J. [1 ]
Ahmadzadeh, Azim [1 ]
Wen, Junzhi [1 ]
Ji, Anli [1 ]
Angryk, Rafal A. [1 ]
机构
[1] Georgia State Univ, Atlanta, GA 30302 USA
来源
NEURAL COMPUTING & APPLICATIONS | 2022年 / 34卷 / 16期
基金
美国国家科学基金会;
关键词
Multivariate time series; Class imbalance; Generative adversarial network; Flare forecasting;
D O I
10.1007/s00521-022-07361-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the major bottlenecks in refining supervised algorithms is data scarcity. This might be caused by a number of reasons often rooted in extremely expensive and lengthy data collection processes. In natural domains such as Heliophysics, it may take decades for sufficiently large samples for machine learning purposes. Inspired by the massive success of generative adversarial networks (GANs) in generating synthetic images, in this study we employed the conditional GAN (CGAN) on a recently released benchmark dataset tailored for solar flare forecasting. Our goal is to generate synthetic multivariate time-series data that (1) are statistically similar to the real data and (2) improve the performance of flare prediction when used to remedy the scarcity of strong flares. To evaluate the generated samples, first, we used the Kullback-Leibler divergence and adversarial accuracy measures to quantify the similarity between the real and synthetic data in terms of their descriptive statistics. Second, we evaluated the impact of the generated samples by training a predictive model on their descriptive statistics, which resulted in a significant improvement (over 1100% in TSS and 350% in HSS). Third, we used the generated time series to examine their high-dimensional contribution to mitigating the scarcity of the strong flares, which we also observed a significant improvement in terms of TSS (4%, 7%, and 31%) and HSS (75%, 35%, and 72%), compared to oversampling, undersampling, and synthetic oversampling methods, respectively. We believe our findings can open new doors toward more robust and accurate flare forecasting models.
引用
收藏
页码:13339 / 13353
页数:15
相关论文
共 50 条
  • [41] Synthetic metocean time series generation for offshore operability and design based on multivariate Markov model
    De Masi, Giulia
    Bruschi, Roberto
    Drago, Michele
    OCEANS 2015 - GENOVA, 2015,
  • [42] Limited Data Forecasting of Financial Time-series using Graph-based Class Dynamics
    Money, Rohan
    Krishnan, Joshin
    Beferull-Lozano, Baltasar
    32ND EUROPEAN SIGNAL PROCESSING CONFERENCE, EUSIPCO 2024, 2024, : 837 - 841
  • [43] Impacts of Data Preprocessing and Sampling Techniques on Solar Flare Prediction from Multivariate Time Series Data of Photospheric Magnetic Field Parameters
    EskandariNasab, MohammadReza
    Hamdi, Shah Muhammad
    Boubrahimi, Soukaina Filali
    ASTROPHYSICAL JOURNAL SUPPLEMENT SERIES, 2024, 275 (01):
  • [44] A novel synergistic fibroblast optimization based Kalman estimation model for forecasting time-series data
    Dhivyaprabha, T. T.
    Subashini, P.
    Krishnaveni, M.
    Santhi, N.
    Sivanpillai, Ramesh
    Jayashree, G.
    EVOLVING SYSTEMS, 2019, 10 (02) : 205 - 220
  • [45] Enhanced Neural Network-Based Univariate Time-Series Forecasting Model for Big Data
    Namasudra, Suyel
    Dhamodharavadhani, S.
    Rathipriya, R.
    Crespo, Ruben Gonzalez
    Moparthi, Nageswara Rao
    BIG DATA, 2024, 12 (02) : 83 - 99
  • [46] A novel synergistic fibroblast optimization based Kalman estimation model for forecasting time-series data
    T. T. Dhivyaprabha
    P. Subashini
    M. Krishnaveni
    N. Santhi
    Ramesh Sivanpillai
    G. Jayashree
    Evolving Systems, 2019, 10 : 205 - 220
  • [47] Qualitative and Quantitative Evaluation of Multivariate Time-Series Synthetic Data Generated Using MTS-TGAN: A Novel Approach
    Yadav, Parul
    Gaur, Manish
    Fatima, Nishat
    Sarwar, Saqib
    APPLIED SCIENCES-BASEL, 2023, 13 (07):
  • [48] Deep learning model to predict lupus nephritis renal flare based on dynamic multivariable time-series data
    Huang, Siwan
    Chen, Yinghua
    Song, Yanan
    Wu, Kaiyuan
    Chen, Tiange
    Zhang, Yuan
    Jia, Wenxiao
    Zhang, Hai-Tao
    Liang, Dan-Dan
    Yang, Jing
    Zeng, Cai-Hong
    Li, Xiang
    Liu, Zhi-Hong
    BMJ OPEN, 2024, 14 (03):
  • [49] Outlier Detection and Removal in Multivariate Time Series for a More Robust Machine Learning-based Solar Flare Prediction
    Wen, Junzhi
    Ahmadzadeh, Azim
    Georgoulis, Manolis K.
    Sadykov, Viacheslav M.
    Angryk, Rafal A.
    ASTROPHYSICAL JOURNAL SUPPLEMENT SERIES, 2025, 277 (02):
  • [50] A clustering algorithm for detecting differential deviations in the multivariate time-series IoT data based on sensor relationship
    Idrees, Rabbia
    Maiti, Ananda
    Garg, Saurabh
    KNOWLEDGE AND INFORMATION SYSTEMS, 2025, 67 (03) : 2641 - 2690