共 50 条
Study on the influence of input variables on the supervised machine learning model for landslide susceptibility mapping
被引:0
|作者:
Lai, Peng
[1
,2
]
Guo, Fei
[1
,2
,3
]
Huang, Xiaohu
[1
,2
]
Zhou, Dongwei
[1
,2
]
Wang, Li
[1
,2
]
Chen, Guangfu
[1
,2
,4
]
机构:
[1] Minist Educ, Key Lab Geol Hazards Three Gorges Reservoir Area, Yichang 443002, Peoples R China
[2] China Three Gorges Univ, Coll Civil Engn & Architecture, Yichang 443002, Peoples R China
[3] China Univ Geosci, Badong Natl Observat & Res Stn Geohazards, Wuhan 430074, Peoples R China
[4] China Geol Survey, Cent South China Innovat Ctr Geosci, Wuhan Ctr, Wuhan 430205, Peoples R China
基金:
中国国家自然科学基金;
关键词:
Landslide susceptibility;
Input variables;
Supervised machine learning models;
Frequency ratio;
Neighborhood frequency ratio;
SUPPORT VECTOR MACHINE;
LOGISTIC-REGRESSION;
SPATIAL PREDICTION;
FREQUENCY RATIO;
RANDOM SUBSPACE;
FOREST;
BASIN;
AREA;
TREE;
CLASSIFICATION;
D O I:
10.1007/s12665-024-11501-9
中图分类号:
X [环境科学、安全科学];
学科分类号:
08 ;
0830 ;
摘要:
Supervised machine learning (ML) models are currently popular in landslide susceptibility mapping (LSM). However, the input variables of these models have some inherent limitations in terms of the lack of nonlinear relationship between the raw input variables and landslides, and the loss of a significant amount of information induced by the demand of the discretization of continuous environmental factors for the discrete and frequency ratio values input variables. Therefore, to address these issues, a new method of neighborhood frequency ratio for obtaining input variables was adopted in this paper. The present study compared the results of four input variables and seven supervised ML models under 28 conditions, with the use of ROC (receiver operating characteristic) curves as evaluation methods for the prediction results. The AUC (area under curve) values, ranging from 0.8223 to 0.9928, shows that the input variables are very important to the evaluation model. The experimental results were analyzed from the perspective of algorithm principles and data characteristics. The main conclusions are as follows: (1) for the non-tree models (i.e., models other than tree models), neighborhood frequency ratio of environmental factors should be used as the model inputs. (2) For tree models (i.e., decision trees and the decision tree based integrated models), the raw values of environmental factors can be used directly as the model inputs of the LSM model. (3) The decision tree based integrated models yielded better prediction results.
引用
收藏
页数:19
相关论文