MLReal: Bridging the gap between training on synthetic data and real data applications in machine learning

被引:32
|
作者
Alkhalifah, Tariq [1 ]
Wang, Hanchen [1 ]
Ovcharenko, Oleg [1 ]
机构
[1] King Abdullah Univ Sci & Technol, Phys Sci & Engn, Mail Box 1280, Thuwal 239556900, Saudi Arabia
关键词
Neural networks; Induced seismicity; Image processing; Computational seismology; Waveform inversion; INVERSION;
D O I
10.1016/j.aiig.2022.09.002
中图分类号
P [天文学、地球科学];
学科分类号
07 ;
摘要
Among the biggest challenges we face in utilizing neural networks trained on waveform (i.e., seismic, electromagnetic, or ultrasound) data is its application to real data. The requirement for accurate labels often forces us to train our networks using synthetic data, where labels are readily available. However, synthetic data often fail to capture the reality of the field/real experiment, and we end up with poor performance of the trained neural networks (NNs) at the inference stage. This is because synthetic data lack many of the realistic features embedded in real data, including an accurate waveform source signature, realistic noise, and accurate reflectivity. In other words, the real data set is far from being a sample from the distribution of the synthetic training set. Thus, we describe a novel approach to enhance our supervised neural network (NN) training on synthetic data with real data features (domain adaptation). Specifically, for tasks in which the absolute values of the vertical axis (time or depth) of the input section are not crucial to the prediction, like classification, or can be corrected after the prediction, like velocity model building using a well, we suggest a series of linear operations on the input to the network data so that the training and application data have similar distributions. This is accomplished by applying two operations on the input data to the NN, whether the input is from the synthetic or real data subset domain: (1) The crosscorrelation of the input data section (i.e., shot gather, seismic image, etc.) with a fixed-location reference trace from the input data section. (2) The convolution of the resulting data with the mean (or a random sample) of the autocorrelated sections from the other subset domain. In the training stage, the input data are from the synthetic subset domain and the auto-corrected (we crosscorrelate each trace with itself) sections are from the real subset domain, and the random selection of sections from the real data is implemented at every epoch of the training. In the inference/application stage, the input data are from the real subset domain and the mean of the autocorrelated sections are from the synthetic data subset domain. Example applications on passive seismic data for microseismic event source location determination and on active seismic data for predicting low frequencies are used to demonstrate the power of this approach in improving the applicability of our trained NNs to real data.
引用
收藏
页码:101 / 114
页数:14
相关论文
共 50 条
  • [1] Transfer-Learning: Bridging the Gap between Real and Simulation Data for Machine Learning in Injection Molding
    Tercan, Hasan
    Guajardo, Alexandro
    Heinisch, Julian
    Thiele, Thomas
    Hopmann, Christian
    Meisen, Tobias
    51ST CIRP CONFERENCE ON MANUFACTURING SYSTEMS, 2018, 72 : 185 - 190
  • [2] ClinicalomicsDB - Bridging the gap between clinical omics data and machine learning
    Moon, Chang In
    Jia, Byron
    Zhang, Bing
    CANCER RESEARCH, 2023, 83 (05)
  • [3] Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation
    He, Zhiwei
    Wang, Xing
    Wang, Rui
    Shi, Shuming
    Tu, Zhaopeng
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 6611 - 6623
  • [4] RadSimReal: Bridging the Gap Between Synthetic and Real Data in Radar Object Detection With Simulation
    Bialer, Oded
    Haitman, Yuval
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 15407 - 15416
  • [5] Narrowing the semantic gap between real and synthetic data
    Beche, Radu
    Nedevschi, Sergiu
    2020 IEEE 16TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP 2020), 2020, : 361 - 367
  • [6] Machine learning fairness notions: Bridging the gap with real-world applications
    Makhlouf, Karima
    Zhioua, Sami
    Palamidessi, Catuscia
    INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (05)
  • [7] Bridging the Gap between Spatial Data Sources and Mashup Applications
    Zhou, Wei
    Chi, Chi-Hung
    Wang, Can
    Wong, Raymond
    Ding, Chen
    2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 553 - 560
  • [8] Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization
    Tremblay, Jonathan
    Prakash, Aayush
    Acuna, David
    Brophy, Mark
    Jampani, Varun
    Anil, Cem
    To, Thang
    Cameracci, Eric
    Boochoon, Shaad
    Birchfield, Stan
    PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 1082 - 1090
  • [9] “Transfer Learning” for Bridging the Gap Between Data Sciences and the Deep Learning
    Sohail A.
    Annals of Data Science, 2024, 11 (01) : 337 - 345
  • [10] Artificial Intelligence and Machine Learning in Pharmacological Research: Bridging the Gap Between Data and Drug Discovery
    Singh, Shruti
    Kumar, Rajesh
    Payra, Shuvasree
    Singh, Sunil K.
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (08)