Geostatistical semi-supervised learning for spatial prediction

被引:4
|
作者
Fouedjio, Francky [1 ]
Talebi, Hassan [2 ]
机构
[1] Rio Tinto, Data & Analyt, 152-158 St Georges Terrace, Perth, WA 6000, Australia
[2] Rio Tinto, Dev & Technol, 152-158 St Georges Terrace, Perth, WA 6000, Australia
关键词
Labeled spatial data; Unlabeled spatial data; Spatial autocorrelation; Pseudo labeling; Spatial prediction; REMOTE-SENSING DATA; RANDOM FOREST; CLASSIFICATION; INTERPOLATION; ALGORITHMS; REGION;
D O I
10.1016/j.aiig.2022.12.002
中图分类号
P [天文学、地球科学];
学科分类号
07 ;
摘要
Geoscientists are increasingly tasked with spatially predicting a target variable in the presence of auxiliary information using supervised machine learning algorithms. Typically, the target variable is observed at a few sampling locations due to the relatively time-consuming and costly process of obtaining measurements. In contrast, auxiliary variables are often exhaustively observed within the region under study through the increasing development of remote sensing platforms and sensor networks. Supervised machine learning methods do not fully leverage this large amount of auxiliary spatial data. Indeed, in these methods, the training dataset includes only labeled data locations (where both target and auxiliary variables were measured). At the same time, unlabeled data locations (where auxiliary variables were measured but not the target variable) are not considered during the model training phase. Consequently, only a limited amount of auxiliary spatial data is utilized during the model training stage. As an alternative to supervised learning, semi-supervised learning, which learns from labeled as well as unlabeled data, can be used to address this problem. However, conventional semi-supervised learning techniques do not account for the specificities of spatial data. This paper introduces a spatial semi-supervised learning framework where geostatistics and machine learning are combined to harness a large amount of unlabeled spatial data in combination with typically a smaller set of labeled spatial data. The main idea consists of leveraging the target variable's spatial autocorrelation to generate pseudo labels at unlabeled data points that are geographically close to labeled data points. This is achieved through geostatistical conditional simulation, where an ensemble of pseudo labels is generated to account for the uncertainty in the pseudo labeling process. The observed labels are augmented by this ensemble of pseudo labels to create an ensemble of pseudo training datasets. A supervised machine learning model is then trained on each pseudo training dataset, followed by an aggregation of trained models. The proposed geostatistical semi-supervised learning method is applied to synthetic and real-world spatial datasets. Its predictive performance is compared with some classical supervised and semi-supervised machine learning methods. It appears that it can effectively leverage a large amount of unlabeled spatial data to improve the target variable's spatial prediction.
引用
收藏
页码:162 / 178
页数:17
相关论文
共 50 条
  • [41] Universal Semi-Supervised Learning
    Huang, Zhuo
    Xue, Chao
    Han, Bo
    Yang, Jian
    Gong, Chen
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [42] Adversarial Dropout for Supervised and Semi-Supervised Learning
    Park, Sungrae
    Park, JunKeon
    Shin, Su-Jin
    Moon, Il-Chul
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3917 - 3924
  • [43] Supervised and semi-supervised machine learning ranking
    Vittaut, Jean-Noel
    Gallinari, Patrick
    COMPARATIVE EVALUATION OF XML INFORMATION RETRIEVAL SYSTEMS, 2007, 4518 : 213 - 222
  • [44] Graph based semi-supervised learning using spatial segregation theory
    Bozorgnia, Farid
    Fotouhi, Morteza
    Arakelyan, Avetik
    Elmoataz, Abderrahim
    JOURNAL OF COMPUTATIONAL SCIENCE, 2023, 74
  • [45] Factorised Spatial Representation Learning: Application in Semi-supervised Myocardial Segmentation
    Chartsias, Agisilaos
    Joyce, Thomas
    Papanastasiou, Giorgos
    Semple, Scott
    Williams, Michelle
    Newby, David
    Dharmakumar, Rohan
    Tsaftaris, Sotirios A.
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2018, PT II, 2018, 11071 : 490 - 498
  • [46] SEMI-SUPERVISED LEARNING OF SPARSE REPRESENTATIONS TO RECOGNIZE PEOPLE SPATIAL ORIENTATION
    Noceti, Nicoletta
    Odone, Francesca
    2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 3382 - 3386
  • [47] Semi-Supervised Learning via Regularized Boosting Working on Multiple Semi-Supervised Assumptions
    Chen, Ke
    Wang, Shihai
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (01) : 129 - 143
  • [48] ASLDP: An Active Semi-supervised Learning method for Disk Failure Prediction
    Zhou, Yang
    Wang, Fang
    Feng, Dan
    50TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2021,
  • [49] Semi-Supervised Self-Learning-Based Lifetime Prediction for Batteries
    Che, Yunhong
    Stroe, Daniel-Ioan
    Hu, Xiaosong
    Teodorescu, Remus
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2023, 19 (05) : 6471 - 6481
  • [50] Deep Semi-supervised Learning with Weight Map for Review Helpfulness Prediction
    Yin, Hua
    Hu, Zhensheng
    Peng, Yahui
    Wang, Zhijian
    Xu, Guanglong
    Xu, Yanfang
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2021, 18 (04) : 1159 - 1174