MLReal: Bridging the gap between training on synthetic data and real data applications in machine learning

被引:32
|
作者
Alkhalifah, Tariq [1 ]
Wang, Hanchen [1 ]
Ovcharenko, Oleg [1 ]
机构
[1] King Abdullah Univ Sci & Technol, Phys Sci & Engn, Mail Box 1280, Thuwal 239556900, Saudi Arabia
关键词
Neural networks; Induced seismicity; Image processing; Computational seismology; Waveform inversion; INVERSION;
D O I
10.1016/j.aiig.2022.09.002
中图分类号
P [天文学、地球科学];
学科分类号
07 ;
摘要
Among the biggest challenges we face in utilizing neural networks trained on waveform (i.e., seismic, electromagnetic, or ultrasound) data is its application to real data. The requirement for accurate labels often forces us to train our networks using synthetic data, where labels are readily available. However, synthetic data often fail to capture the reality of the field/real experiment, and we end up with poor performance of the trained neural networks (NNs) at the inference stage. This is because synthetic data lack many of the realistic features embedded in real data, including an accurate waveform source signature, realistic noise, and accurate reflectivity. In other words, the real data set is far from being a sample from the distribution of the synthetic training set. Thus, we describe a novel approach to enhance our supervised neural network (NN) training on synthetic data with real data features (domain adaptation). Specifically, for tasks in which the absolute values of the vertical axis (time or depth) of the input section are not crucial to the prediction, like classification, or can be corrected after the prediction, like velocity model building using a well, we suggest a series of linear operations on the input to the network data so that the training and application data have similar distributions. This is accomplished by applying two operations on the input data to the NN, whether the input is from the synthetic or real data subset domain: (1) The crosscorrelation of the input data section (i.e., shot gather, seismic image, etc.) with a fixed-location reference trace from the input data section. (2) The convolution of the resulting data with the mean (or a random sample) of the autocorrelated sections from the other subset domain. In the training stage, the input data are from the synthetic subset domain and the auto-corrected (we crosscorrelate each trace with itself) sections are from the real subset domain, and the random selection of sections from the real data is implemented at every epoch of the training. In the inference/application stage, the input data are from the real subset domain and the mean of the autocorrelated sections are from the synthetic data subset domain. Example applications on passive seismic data for microseismic event source location determination and on active seismic data for predicting low frequencies are used to demonstrate the power of this approach in improving the applicability of our trained NNs to real data.
引用
收藏
页码:101 / 114
页数:14
相关论文
共 50 条
  • [41] Near infrared spectroscopy. Bridging the gap between data analysis and NIR applications
    Journal of the American Chemical Society, 1994, 116 (02):
  • [42] Bridging the Gap Between HPC and Big Data Frameworks
    Anderson, Michael
    Smith, Shaden
    Sundaram, Narayanan
    Capota, Mihai
    Zhao, Zheguang
    Dulloor, Subramanya
    Satish, Nadathur
    Willke, Theodore L.
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (08): : 901 - 912
  • [43] Bridging the Gap between Linked Data and the Semantic Desktop
    Groza, Tudor
    Dragan, Laura
    Handschuh, Siegfried
    Decker, Stefan
    SEMANTIC WEB - ISWC 2009, PROCEEDINGS, 2009, 5823 : 827 - +
  • [44] Bridging the gap between administrative data and clinical observations
    Lam, Sandi
    Pan, I-Wen
    Jea, Andrew
    Luerssen, Thomas G.
    JOURNAL OF NEUROSURGERY-PEDIATRICS, 2016, 17 (06) : 763 - 764
  • [45] Applications of and issues with machine learning in medicine: Bridging the gap with explainable AI
    Karako, Kenji
    Tang, Wei
    BIOSCIENCE TRENDS, 2024,
  • [46] Robust deep learning for eye fundus images: Bridging real and synthetic data for enhancing generalization
    Oliveira, Guilherme C.
    Rosa, Gustavo H.
    Pedronette, Daniel C. G.
    Papa, Joao P.
    Kumar, Himeesh
    Passos, Leandro A.
    Kumar, Dinesh
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 94
  • [47] Robust deep learning for eye fundus images: Bridging real and synthetic data for enhancing generalization
    Oliveira, Guilherme C.
    Rosa, Gustavo H.
    Pedronette, Daniel C.G.
    Papa, João P.
    Kumar, Himeesh
    Passos, Leandro A.
    Kumar, Dinesh
    Biomedical Signal Processing and Control, 2024, 94
  • [48] Data Science with Vadalog: Bridging Machine Learning and Reasoning
    Bellomarini, Luigi
    Fayzrakhmanov, Ruslan R.
    Gottlob, Georg
    Kravchenko, Andrey
    Laurenza, Eleonora
    Nenov, Yavor
    Reissfelder, Stephane
    Sallinger, Emanuel
    Sherkhonov, Evgeny
    Wu, Lianlong
    MODEL AND DATA ENGINEERING, MEDI 2018, 2018, 11163 : 3 - 21
  • [49] Bridging the Gap Between Data and Outliers: Using Machine Learning to Automate Outlier Sensor Behaviour for Bridge Structural Health Monitoring
    Ossetchkina, Ekaterina
    Mylonas, Paraskevas
    Sabamehr, Ardalan
    PROCEEDINGS OF THE CANADIAN SOCIETY FOR CIVIL ENGINEERING ANNUAL CONFERENCE 2023, VOL 11, CSCE 2023, 2024, 505 : 161 - 175
  • [50] Bridging multimodal data and battery science with machine learning
    Ning, Yanbin
    Yang, Feng
    Zhang, Yan
    Qiang, Zhuomin
    Yin, Geping
    Wang, Jiajun
    Lou, Shuaifeng
    MATTER, 2024, 7 (06) : 2011 - 2032