Application of multi-algorithm ensemble methods in high-dimensional and small-sample data of geotechnical engineering: A case study of swelling pressure of expansive soils

被引:18
|
作者
Li, Chao [1 ]
Wang, Lei [1 ]
Li, Jie [2 ]
Chen, Yang [3 ]
机构
[1] Shanghai Univ Engn Sci, Sch Urban Railway Transportat, Shanghai 201620, Peoples R China
[2] RMIT Univ, Discipline Civil & Infrastruct Engn, Melbourne 3001, Australia
[3] Shanghai Jiao Tong Univ, Sch Naval Architecture & Civil Engn, Shanghai 200240, Peoples R China
关键词
Expansive soils; Swelling pressure; Machine learning (ML); Multi-algorithm ensemble; Sensitivity analysis; PREDICTION; SUCTION; HEAVE;
D O I
10.1016/j.jrmge.2023.10.015
中图分类号
P5 [地质学];
学科分类号
0709 ; 081803 ;
摘要
Geotechnical engineering data are usually small-sample and high-dimensional, which brings a lot of challenges in predictive modeling. This paper uses a typical high-dimensional and small-sample swell pressure (Ps) s ) dataset to explore the possibility of using multi-algorithm hybrid ensemble and dimensionality reduction methods to mitigate the uncertainty of soil parameter prediction. Based on six machine learning (ML) algorithms, the base learner pool is constructed, and four ensemble methods, Stacking (SG), Blending (BG), Voting regression (VR), and Feature weight linear stacking (FWL), are used for the multi-algorithm ensemble. Furthermore, the importance of permutation is used for feature dimensionality reduction to mitigate the impact of weakly correlated variables on predictive modeling. The results show that the proposed methods are superior to traditional prediction models and base ML models, where FWL is more suitable for modeling with small-sample datasets, and dimensionality reduction can simplify the data structure and reduce the adverse impact of the small-sample effect, which points the way to feature selection for predictive modeling. Based on the ensemble methods, the feature importance of the five primary factors affecting Ps s is the maximum dry density (31.145%), clay fraction (15.876%), swell percent (15.289%), plasticity index (14%), and optimum moisture content (13.69%), the influence of input parameters on Ps s is also investigated, in line with the findings of the existing literature. (c) 2024 Institute of Rock and Soil Mechanics, Chinese Academy of Sciences. Production and hosting by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/ licenses/by-nc-nd/4.0/).
引用
收藏
页码:1896 / 1917
页数:22
相关论文
共 6 条
  • [1] Analysis of traffic accident causes based on data augmentation and ensemble learning with high-dimensional small-sample data
    Zhu, Leipeng
    Zhang, Zhiqing
    Song, Dongdong
    Chen, Biao
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 237
  • [2] A Hybrid Feature Selection Algorithm Applied to High-dimensional Imbalanced Small-sample Data Classification
    Feng, Fang
    Lv, Qingquan
    Wang, Mingsong
    Yang, Xuhui
    Zhou, Qingguo
    Zhou, Rui
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 41 - 46
  • [3] Multisource Latent Feature Selective Ensemble Modeling Approach for Small-Sample High-Dimensional Process Data in Applications
    Tang, Jian
    Zhang, Jian
    Yu, Gang
    Zhang, Wenping
    Yu, Wen
    IEEE ACCESS, 2020, 8 : 148475 - 148488
  • [4] A Methodology for Modeling a Multi-Dimensional Joint Distribution of Parameters Based on Small-Sample Data, and Its Application in High Rockfill Dams
    Guo, Qinqin
    Huang, Huibao
    Lu, Xiang
    Chen, Jiankang
    Zhang, Xiaoshuang
    Zhao, Zhiyi
    APPLIED SCIENCES-BASEL, 2024, 14 (17):
  • [5] A Hybrid Improved Multi-objective Particle Swarm Optimization Feature Selection Algorithm for High-Dimensional Small Sample Data
    Pan, Xiaoying
    Sun, Jun
    Xue, Yufeng
    ADVANCES IN NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, ICNC-FSKD 2022, 2023, 153 : 475 - 482
  • [6] Benefits of dimension reduction in penalized regression methods for high-dimensional grouped data: a case study in low sample size
    Ajana, Soufiane
    Acar, Niyazi
    Bretillon, Lionel
    Hejblum, Boris P.
    Jacqmin-Gadda, Helene
    Delcourt, Cecile
    Berdeaux, Olivier
    Bouton, Sylvain
    Bron, Alain
    Buaud, Benjamin
    Cabaret, Stephanie
    Cougnard-Gregorie, Audrey
    Creuzot-Garcher, Catherine
    Delyfer, Marie-Noelle
    Feart-Couret, Catherine
    Febvret, Valerie
    Gregoire, Stephane
    He, Zhiguo
    Korobelnik, Jean-Francois
    Martine, Lucy
    Merle, Benedicte
    Vaysse, Carole
    BIOINFORMATICS, 2019, 35 (19) : 3628 - 3634