Prediction of Pseudomonas aeruginosa abundance in drinking water distribution systems using machine learning

被引:0
|
作者
Zhou, Qiaomei [1 ]
Li, Yukang [2 ]
Wang, Min [2 ]
Huang, Jingang [1 ,3 ]
Li, Weishuai [1 ]
Qiu, Shanshan [1 ]
Wang, Haibo [2 ]
机构
[1] Hangzhou Dianzi Univ, Coll Mat & Environm Engn, Hangzhou 310018, Peoples R China
[2] Chinese Acad Sci, Res Ctr Ecoenvironm Sci, Key Lab Drinking Water Sci & Technol, Beijing 100085, Peoples R China
[3] Hangzhou Dianzi Univ, China Austria Belt & Rd Joint Lab Artificial Intel, Hangzhou 310018, Peoples R China
基金
中国国家自然科学基金;
关键词
Machine learning; Pseudomonas aeruginosa; Drinking water; Feature selection; Model validation; OPTIMIZATION; SELECTION;
D O I
10.1016/j.psep.2024.11.099
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The detection of Pseudomonas aeruginosa is a challenging but crucial task to ensure the bio-safety of drinking water. The current cultivation and molecular qPCR methods are costly, laborious and time-consuming, leading to inaccuracies and delayed monitoring. In this study, three machine learning (ML) models, including eXtreme Gradient Boosting (XGBoost), Random Forest (RF), and Support Vector Regression (SVR), were developed, interpreted, and validated for their ability to predict P. aeruginosa abundance in both urban and rural drinking water distribution systems (DWDS). To ensure the reliability and robustness of ML models, data leakage management for data pre-processing, 5-fold cross-validation and grid search for hyperparameters tuning were utilized during the training phase. To control overfitting issues, feature selection using embedded method was implemented to exclude three low-contributing input variables of oxidation-reduction potential (ORP), total chlorine, and heterotrophic plate counts (HPC). The XGBoost model outperformed RF and SVR models in terms of accuracy and generalizability in predicting P. aeruginosa abundance, achieving training/testing R2 of 0.92/ 0.85 in urban system, and 0.94/0.87 in rural system, respectively. Feature importance analysis revealed that water temperature, dissolved oxygen (DO), residual chlorine, and NO3--N were key variables for the prediction. The validation experiments, by randomly sampling from both urban and rural DWDS, demonstrated acceptable relative errors of 10.77 % and 8.86 %, respectively. Overall, this study provides an applicable ML modeling framework for the accurate and fast prediction of P. aeruginosa abundance in DWDS, potentially reducing laborious experiments in future.
引用
收藏
页码:1050 / 1060
页数:11
相关论文
共 50 条
  • [31] Variability of invertebrate abundance in drinking water distribution systems in the Netherlands in relation to biostability and sediment volumes
    van Lieverloo, J. Hein M.
    Hoogenboezem, Wim
    Veenendaal, Gerrit
    van der Kooij, Dick
    WATER RESEARCH, 2012, 46 (16) : 4918 - 4932
  • [32] Molecular Survey of the Occurrence of Legionella spp., Mycobacterium spp., Pseudomonas aeruginosa, and Amoeba Hosts in Two Chloraminated Drinking Water Distribution Systems
    Wang, Hong
    Edwards, Marc
    Falkinham, Joseph O., III
    Pruden, Amy
    APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2012, 78 (17) : 6285 - 6294
  • [33] Effect of disinfectant residual, pH, and temperature on microbial abundance in disinfected drinking water distribution systems
    Kennedy, Lauren C.
    Miller, Scott E.
    Kantor, Rose S.
    Nelson, Kara L.
    ENVIRONMENTAL SCIENCE-WATER RESEARCH & TECHNOLOGY, 2021, 7 (01) : 78 - 92
  • [34] Characterisation of potential virulence markers in Pseudomonas aeruginosa isolated from drinking water
    Zamberlan da Silva, Marie Eliza
    Camargo Filho, Ivens
    Endo, Eliana Harue
    Nakamura, Celso Vataru
    Ueda-Nakamura, Tania
    Dias Filho, Benedito Prado
    ANTONIE VAN LEEUWENHOEK INTERNATIONAL JOURNAL OF GENERAL AND MOLECULAR MICROBIOLOGY, 2008, 93 (04): : 323 - 334
  • [35] Biofilms in Drinking Water Distribution Systems
    M. Batté
    B.M.R. Appenzeller
    D. Grandjean
    S. Fass
    V. Gauthier
    F. Jorand
    L. Mathieu
    M. Boualam
    S. Saby
    J.C. Block
    Reviews in Environmental Science and Biotechnology, 2003, 2 (2-4) : 147 - 168
  • [36] Presence of Pseudomonas aeruginosa in coliform-free sachet drinking water in Ghana
    Stoler, Justin
    Ahmed, Hawa
    Frimpong, Lady Asantewa
    Bello, Mohammed
    FOOD CONTROL, 2015, 55 : 242 - 247
  • [37] Prevalence and genetic characterization of Pseudomonas aeruginosa in drinking water in Guangdong Province of China
    Wu, Qingping
    Ye, Yingwang
    Li, Fei
    Zhang, Jumei
    Guo, Weipeng
    LWT-FOOD SCIENCE AND TECHNOLOGY, 2016, 69 : 24 - 31
  • [38] Pseudomonas aeruginosa in bottled drinking water in Sri Lanka: a potential health hazard
    Herath, A. T.
    Abayasekara, C. L.
    Chandrajith, Rohana
    Adikaram, N. K. B.
    WATER SCIENCE AND TECHNOLOGY-WATER SUPPLY, 2014, 14 (06): : 1045 - 1050
  • [39] Characterisation of potential virulence markers in Pseudomonas aeruginosa isolated from drinking water
    Marie Eliza Zamberlan da Silva
    Ivens Camargo Filho
    Eliana Harue Endo
    Celso Vataru Nakamura
    Tânia Ueda-Nakamura
    Benedito Prado Dias Filho
    Antonie van Leeuwenhoek, 2008, 93 : 323 - 334
  • [40] Protein Abundance Prediction Through Machine Learning Methods
    Ferreira, Mauricio
    Ventorim, Rafaela
    Almeida, Eduardo
    Silveira, Sabrina
    Silveira, Wendel
    JOURNAL OF MOLECULAR BIOLOGY, 2021, 433 (22)