Addressing gaps in data on drinking water quality through data integration and machine learning: evidence from Ethiopia

被引:0
|
作者
Alemayehu A. Ambel
Robert Bain
Tefera Bekele Degefu
Ayca Donmez
Richard Johnston
Tom Slaymaker
机构
[1] Development Data Group,Division of Data, Analysis
[2] World Bank,Department of Environment
[3] Planning and Monitoring,undefined
[4] UNICEF,undefined
[5] Climate Change and Health,undefined
[6] WHO,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Monitoring access to safely managed drinking water services requires information on water quality. An increasing number of countries have integrated water quality testing in household surveys however it is not anticipated that such tests will be included in all future surveys. Using water testing data from the 2016 Ethiopia Socio-Economic Survey (ESS) we developed predictive models to identify households using contaminated (≥1 E. coli per 100 mL) drinking water sources based on common machine learning classification algorithms. These models were then applied to the 2013–2014 and 2018–2019 waves of the ESS that did not include water testing. The highest performing model achieved good accuracy (88.5%; 95% CI 86.3%, 90.6%) and discrimination (AUC 0.91; 95% CI 0.89, 0.94). The use of demographic, socioeconomic, and geospatial variables provided comparable results to that of the full features model whereas a model based exclusively on water source type performed poorly. Drinking water quality at the point of collection can be predicted from demographic, socioeconomic, and geospatial variables that are often available in household surveys.
引用
收藏
相关论文
共 50 条
  • [32] Characterizing water quality and quantity profiles with poor quality data in a machine learning algorithm
    Kim, Zhonghyun
    Jeong, Heewon
    Shin, Sora
    Jung, Jinho
    Kim, Joon Ha
    Ki, Seo Jin
    DESALINATION AND WATER TREATMENT, 2020, 182 : 127 - 134
  • [33] The effects of data quality on machine learning performance on tabular data
    Mohammed, Sedir
    Budach, Lukas
    Feuerpfeil, Moritz
    Ihde, Nina
    Nathansen, Andrea
    Noack, Nele
    Patzlaff, Hendrik
    Naumann, Felix
    Harmouch, Hazar
    INFORMATION SYSTEMS, 2025, 132
  • [34] AI for all: bridging data gaps in machine learning and health
    Wang, Monica L.
    Bertrand, Kimberly A.
    TRANSLATIONAL BEHAVIORAL MEDICINE, 2025, 15 (01)
  • [35] Multiphysics Missing Data Synthesis: A Machine Learning Approach for Mitigating Data Gaps and Artifacts
    Steuben, J. C.
    Geltmacher, A. B.
    Rodriguez, S. N.
    Graber, B. D.
    Iliopoulos, A. P.
    Michopoulos, J. G.
    JOURNAL OF COMPUTING AND INFORMATION SCIENCE IN ENGINEERING, 2024, 24 (05)
  • [36] Addressing Missing Environmental Data via a Machine Learning Scheme
    Tzanis, Chris G.
    Alimissis, Anastasios
    Koutsogiannis, Ioannis
    ATMOSPHERE, 2021, 12 (04)
  • [37] Addressing the Data Scarcity Problem in Ecotoxicology via Small Data Machine Learning Methods
    Wang, Ying
    Dong, Jinchu
    Zhou, Yunchi
    Cheng, Yinghao
    Zhao, Xiaoli
    Peijnenburg, Willie J. G. M.
    Vijver, Martina G.
    Leung, Kenneth M. Y.
    Fan, Wenhong
    Wu, Fengchang
    ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2025, 59 (12) : 5867 - 5871
  • [38] Omics data integration in computational biology viewed through the prism of machine learning paradigms
    Fouche, Aziz
    Zinovyev, Andrei
    FRONTIERS IN BIOINFORMATICS, 2023, 3
  • [39] DRINKING-WATER QUALITY DATA-BASES
    WENTWORTH, NW
    WESTRICK, JJ
    WANG, KK
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1985, 190 (SEP): : 45 - ENR
  • [40] Multivariate data analysis of quality parameters in drinking water
    Ortiz-Estarelles, O
    Martín-Biosca, Y
    Medina-Hernández, MJ
    Sagrado, S
    Bonet-Domingo, E
    ANALYST, 2001, 126 (01) : 91 - 96