CGAN-based synthetic multivariate time-series generation: a solution to data scarcity in solar flare forecasting

被引:5
|
作者
Chen, Yang [1 ]
Kempton, Dustin J. [1 ]
Ahmadzadeh, Azim [1 ]
Wen, Junzhi [1 ]
Ji, Anli [1 ]
Angryk, Rafal A. [1 ]
机构
[1] Georgia State Univ, Atlanta, GA 30302 USA
来源
NEURAL COMPUTING & APPLICATIONS | 2022年 / 34卷 / 16期
基金
美国国家科学基金会;
关键词
Multivariate time series; Class imbalance; Generative adversarial network; Flare forecasting;
D O I
10.1007/s00521-022-07361-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the major bottlenecks in refining supervised algorithms is data scarcity. This might be caused by a number of reasons often rooted in extremely expensive and lengthy data collection processes. In natural domains such as Heliophysics, it may take decades for sufficiently large samples for machine learning purposes. Inspired by the massive success of generative adversarial networks (GANs) in generating synthetic images, in this study we employed the conditional GAN (CGAN) on a recently released benchmark dataset tailored for solar flare forecasting. Our goal is to generate synthetic multivariate time-series data that (1) are statistically similar to the real data and (2) improve the performance of flare prediction when used to remedy the scarcity of strong flares. To evaluate the generated samples, first, we used the Kullback-Leibler divergence and adversarial accuracy measures to quantify the similarity between the real and synthetic data in terms of their descriptive statistics. Second, we evaluated the impact of the generated samples by training a predictive model on their descriptive statistics, which resulted in a significant improvement (over 1100% in TSS and 350% in HSS). Third, we used the generated time series to examine their high-dimensional contribution to mitigating the scarcity of the strong flares, which we also observed a significant improvement in terms of TSS (4%, 7%, and 31%) and HSS (75%, 35%, and 72%), compared to oversampling, undersampling, and synthetic oversampling methods, respectively. We believe our findings can open new doors toward more robust and accurate flare forecasting models.
引用
收藏
页码:13339 / 13353
页数:15
相关论文
共 50 条
  • [1] CGAN-based synthetic multivariate time-series generation: a solution to data scarcity in solar flare forecasting
    Yang Chen
    Dustin J. Kempton
    Azim Ahmadzadeh
    Junzhi Wen
    Anli Ji
    Rafal A. Angryk
    Neural Computing and Applications, 2022, 34 : 13339 - 13353
  • [2] Towards Synthetic Multivariate Time Series Generation for Flare Forecasting
    Chen, Yang
    Kempton, Dustin J.
    Ahmadzadeh, Azim
    Angryk, Rafal A.
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING (ICAISC 2021), PT I, 2021, 12854 : 296 - 307
  • [3] Time-Series Feature Selection for Solar Flare Forecasting
    Velanki, Yagnashree
    Hosseinzadeh, Pouya
    Boubrahimi, Soukaina Filali
    Hamdi, Shah Muhammad
    UNIVERSE, 2024, 10 (09)
  • [4] Multioutput Framework for Time-Series Forecasting in Smart Grid Meets Data Scarcity
    Xu, Jiangjiao
    Li, Ke
    Li, Dongdong
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (09) : 11202 - 11212
  • [5] A Neural Networks Based Method for Multivariate Time-Series Forecasting
    Li, Shaowei
    Huang, He
    Lu, Wei
    IEEE ACCESS, 2021, 9 : 63915 - 63924
  • [6] Exploring Variational Autoencoders and Generative Latent Time-Series Models for Synthetic Data Generation and Forecasting
    Dodda, Suresh
    2024 CONTROL INSTRUMENTATION SYSTEM CONFERENCE, CISCON 2024, 2024,
  • [7] AutoMixer for Improved Multivariate Time-Series Forecasting on Business and IT Observability Data
    Palaskar, Santosh
    Ekambaram, Vijay
    Jati, Arindam
    Gantayat, Neelamadhav
    Saha, Avirup
    Nagar, Seema
    Nguyen, Nam H.
    Dayama, Pankaj
    Sindhgatta, Renuka
    Mohapatra, Prateeti
    Kumar, Harshit
    Kalagnanam, Jayant
    Hemachandra, Nandyala
    Rangaraj, Narayan
    THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 22962 - 22968
  • [8] MULTIVARIATE TIME SERIES SYNTHETIC DATA GENERATION IN DIABETES CARE
    Herrero, P.
    Zhu, T.
    Andorra, M.
    Chittajallu, S.
    DIABETES TECHNOLOGY & THERAPEUTICS, 2023, 25 : A239 - A239
  • [9] Clustering Individuals Based on Multivariate EMA Time-Series Data
    Ntekouli, Mandani
    Spanakis, Gerasimos
    Waldorp, Lourens
    Roefs, Anne
    QUANTITATIVE PSYCHOLOGY, 2023, 422 : 161 - 171
  • [10] Identifying Flare-indicative Photospheric Magnetic Field Parameters from Multivariate Time-series Data of Solar Active Regions
    Alshammari, Khaznah
    Hamdi, Shah Muhammad
    Filali Boubrahimi, Soukaina
    ASTROPHYSICAL JOURNAL SUPPLEMENT SERIES, 2024, 271 (02):