Addressing gaps in data on drinking water quality through data integration and machine learning: evidence from Ethiopia

被引:0
|
作者
Alemayehu A. Ambel
Robert Bain
Tefera Bekele Degefu
Ayca Donmez
Richard Johnston
Tom Slaymaker
机构
[1] Development Data Group,Division of Data, Analysis
[2] World Bank,Department of Environment
[3] Planning and Monitoring,undefined
[4] UNICEF,undefined
[5] Climate Change and Health,undefined
[6] WHO,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Monitoring access to safely managed drinking water services requires information on water quality. An increasing number of countries have integrated water quality testing in household surveys however it is not anticipated that such tests will be included in all future surveys. Using water testing data from the 2016 Ethiopia Socio-Economic Survey (ESS) we developed predictive models to identify households using contaminated (≥1 E. coli per 100 mL) drinking water sources based on common machine learning classification algorithms. These models were then applied to the 2013–2014 and 2018–2019 waves of the ESS that did not include water testing. The highest performing model achieved good accuracy (88.5%; 95% CI 86.3%, 90.6%) and discrimination (AUC 0.91; 95% CI 0.89, 0.94). The use of demographic, socioeconomic, and geospatial variables provided comparable results to that of the full features model whereas a model based exclusively on water source type performed poorly. Drinking water quality at the point of collection can be predicted from demographic, socioeconomic, and geospatial variables that are often available in household surveys.
引用
收藏
相关论文
共 50 条
  • [21] Data Integration and Machine Learning: A Natural Synergy
    Dong, Xin Luna
    Rekatsinas, Theodoros
    SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 1645 - 1650
  • [22] Data Integration and Machine Learning: A Natural Synergy
    Dong, Xin Luna
    Rekatsinas, Theodoros
    KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 3193 - 3194
  • [23] Interactive Machine Learning for Laboratory Data Integration
    Fillmore, Nathanael
    Do, Nhan
    Brophy, Mary
    Zimolzak, Andrew
    MEDINFO 2019: HEALTH AND WELLBEING E-NETWORKS FOR ALL, 2019, 264 : 133 - 137
  • [24] Amalur: The Convergence of Data Integration and Machine Learning
    Li, Ziyu
    Sun, Wenbo
    Zhan, Danning
    Kang, Yan
    Chen, Lydia
    Bozzon, Alessandro
    Hai, Rihan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (12) : 7353 - 7367
  • [25] Air Quality Forecast through Integrated Data Assimilation and Machine Learning
    Lin, Hai Xiang
    Jin, Jianbing
    van den Herik, Jaap
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE (ICAART), VOL 2, 2019, : 787 - 793
  • [26] Data Quality for Machine Learning Tasks
    Gupta, Nitin
    Mujumdar, Shashank
    Patel, Hima
    Masuda, Satoshi
    Panwar, Naveen
    Bandyopadhyay, Sambaran
    Mehta, Sameep
    Guttula, Shanmukha
    Afzal, Shazia
    Mittal, Ruhi Sharma
    Munigala, Vitobha
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 4040 - 4041
  • [27] Machine learning and topological kriging for river water quality data interpolation
    Bekti, Rokhana Dwi
    Suryowati, Kris
    Dedu, Maria Oktafiana
    Sulistyaningsih, Eka
    Susanti, Erma
    AIMS ENVIRONMENTAL SCIENCE, 2025, 12 (01) : 120 - 136
  • [28] Differential exposure to drinking water contaminants in North Carolina: Evidence from structural topic modeling and water quality data
    Sohns, Antonia
    JOURNAL OF ENVIRONMENTAL MANAGEMENT, 2023, 336
  • [29] Drinking water quality: results from the data analysis in Lombardy region
    Rivolta, S.
    Diurno, G.
    Ammoni, E.
    Castaldi, S.
    Gramegna, M.
    EUROPEAN JOURNAL OF PUBLIC HEALTH, 2019, 29 : 488 - 488
  • [30] Filling the gaps in soil data: A multi-model framework for addressing data gaps using pedotransfer functions and machine-learning with uncertainty estimates to estimate bulk density
    Arbor, Adrienne
    Schmidt, Margaret
    Zhang, Jin
    Bulmer, Chuck
    Filatow, Deepa
    Kasraei, Babak
    Smukler, Sean
    Heung, Brandon
    CATENA, 2024, 245