Application of multi-algorithm ensemble methods in high-dimensional and small-sample data of geotechnical engineering: A case study of swelling pressure of expansive soils

被引：18

作者：

Li, Chao ^{[1
]}

Wang, Lei ^{[1
]}

Li, Jie ^{[2
]}

Chen, Yang ^{[3
]}

机构：

[1] Shanghai Univ Engn Sci, Sch Urban Railway Transportat, Shanghai 201620, Peoples R China

[2] RMIT Univ, Discipline Civil & Infrastruct Engn, Melbourne 3001, Australia

[3] Shanghai Jiao Tong Univ, Sch Naval Architecture & Civil Engn, Shanghai 200240, Peoples R China

来源：

JOURNAL OF ROCK MECHANICS AND GEOTECHNICAL ENGINEERING | 2024年 / 16卷 / 05期

关键词：

Expansive soils; Swelling pressure; Machine learning (ML); Multi-algorithm ensemble; Sensitivity analysis; PREDICTION; SUCTION; HEAVE;

D O I：

10.1016/j.jrmge.2023.10.015

中图分类号：

P5 [地质学];

学科分类号：

0709 ; 081803 ;

摘要：

Geotechnical engineering data are usually small-sample and high-dimensional, which brings a lot of challenges in predictive modeling. This paper uses a typical high-dimensional and small-sample swell pressure (Ps) s ) dataset to explore the possibility of using multi-algorithm hybrid ensemble and dimensionality reduction methods to mitigate the uncertainty of soil parameter prediction. Based on six machine learning (ML) algorithms, the base learner pool is constructed, and four ensemble methods, Stacking (SG), Blending (BG), Voting regression (VR), and Feature weight linear stacking (FWL), are used for the multi-algorithm ensemble. Furthermore, the importance of permutation is used for feature dimensionality reduction to mitigate the impact of weakly correlated variables on predictive modeling. The results show that the proposed methods are superior to traditional prediction models and base ML models, where FWL is more suitable for modeling with small-sample datasets, and dimensionality reduction can simplify the data structure and reduce the adverse impact of the small-sample effect, which points the way to feature selection for predictive modeling. Based on the ensemble methods, the feature importance of the five primary factors affecting Ps s is the maximum dry density (31.145%), clay fraction (15.876%), swell percent (15.289%), plasticity index (14%), and optimum moisture content (13.69%), the influence of input parameters on Ps s is also investigated, in line with the findings of the existing literature. (c) 2024 Institute of Rock and Soil Mechanics, Chinese Academy of Sciences. Production and hosting by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/ licenses/by-nc-nd/4.0/).

引用

页码：1896 / 1917

页数：22

共 6 条

[1] Analysis of traffic accident causes based on data augmentation and ensemble learning with high-dimensional small-sample data
Zhu, Leipeng
Zhang, Zhiqing
Song, Dongdong
Chen, Biao
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 237
[2] A Hybrid Feature Selection Algorithm Applied to High-dimensional Imbalanced Small-sample Data Classification
Feng, Fang
Lv, Qingquan
Wang, Mingsong
Yang, Xuhui
Zhou, Qingguo
Zhou, Rui
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 41 - 46
[3] Multisource Latent Feature Selective Ensemble Modeling Approach for Small-Sample High-Dimensional Process Data in Applications
Tang, Jian
Zhang, Jian
Yu, Gang
Zhang, Wenping
Yu, Wen
IEEE ACCESS, 2020, 8 : 148475 - 148488
[4] A Methodology for Modeling a Multi-Dimensional Joint Distribution of Parameters Based on Small-Sample Data, and Its Application in High Rockfill Dams
Guo, Qinqin
Huang, Huibao
Lu, Xiang
Chen, Jiankang
Zhang, Xiaoshuang
Zhao, Zhiyi
APPLIED SCIENCES-BASEL, 2024, 14 (17):
[5] A Hybrid Improved Multi-objective Particle Swarm Optimization Feature Selection Algorithm for High-Dimensional Small Sample Data
Pan, Xiaoying
Sun, Jun
Xue, Yufeng
ADVANCES IN NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, ICNC-FSKD 2022, 2023, 153 : 475 - 482
[6] Benefits of dimension reduction in penalized regression methods for high-dimensional grouped data: a case study in low sample size
Ajana, Soufiane
Acar, Niyazi
Bretillon, Lionel
Hejblum, Boris P.
Jacqmin-Gadda, Helene
Delcourt, Cecile
Berdeaux, Olivier
Bouton, Sylvain
Bron, Alain
Buaud, Benjamin
Cabaret, Stephanie
Cougnard-Gregorie, Audrey
Creuzot-Garcher, Catherine
Delyfer, Marie-Noelle
Feart-Couret, Catherine
Febvret, Valerie
Gregoire, Stephane
He, Zhiguo
Korobelnik, Jean-Francois
Martine, Lucy
Merle, Benedicte
Vaysse, Carole
BIOINFORMATICS, 2019, 35 (19) : 3628 - 3634

← 1 →