Identifying schools at high-risk for elevated lead in drinking water using only publicly available data

被引:15
|
作者
Lobo, G. P. [1 ]
Laraway, J. [2 ]
Gadgil, A. J. [1 ]
机构
[1] Univ Calif Berkeley, Dept Civil & Environm Engn, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Dept Environm Sci Policy & Management, Berkeley, CA 94720 USA
基金
美国国家科学基金会;
关键词
Lead in school drinking water; Lead leaching; Machine learning; Environmental justice; Public data mining; SUPPLY SYSTEMS; TAP WATER; CORROSION; VARIABILITY; PB; MONOCHLORAMINE; ORTHOPHOSPHATE; DISINFECTION; BRASS; FULL;
D O I
10.1016/j.scitotenv.2021.150046
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Estimating the risk of lead contamination of schools' drinking water at the State level is a complex, important, and unexplored challenge. Variable water quality among water systems and changes in water chemistry during distribution affect lead dissolution rates from pipes and fittings. In addition, the locations of lead-bearing plumbing materials are uncertain. We tested the capability of six machine learning models to predict the likelihood of lead contamination of drinking water at the schools' taps using only publicly available datasets. The predictive features used in the models correspond to those with a proven correlation to the dominant, but commonly unavailable, factors that govern lead leaching: the presence of lead-bearing plumbing materials and water quality conducive to lead corrosion. By combining water chemistry data from public reports, socioeconomic information from the US census, and spatial features using Geographic Information Systems, we trained and tested models to estimate the likelihood of lead contaminated tap water in over 8,000 schools across California and Massachusetts. Our best-performing model was a Random Forest, with a 10-fold cross validation score of 0.88 for Massachusetts and 0.78 for California using the average Area Under the Receiver Operating Characteristic Curve (ROC AUC) metric. The model was then used to assign a lead leaching risk category to half of the schools across California (the other half was used for training). There was good agreement between the modeled risk categories and the actual lead leaching outcomes for every school; however, the model overestimated the lead leaching risk in up to 17% of the schools. This model is the first of its kind to offer a tool to predict the risk of lead leaching in schools at the State level. Further use of this model can help deploy limited resources more effectively to prevent childhood lead exposure from school drinking water. (c) 2021 Published by Elsevier B.V.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Change IS Possible: Reducing High-Risk Drinking Using a Collaborative Improvement Model
    Lanter, Patricia L.
    Wolff, Kristina B.
    Johnson, Lisa C.
    Ercolano, Ellyn M.
    Kilmer, Jason R.
    Provost, Lloyd
    JOURNAL OF AMERICAN COLLEGE HEALTH, 2015, 63 (05) : 330 - 336
  • [32] TRIHALOMETHANES IN DRINKING-WATER AND CANCER - RISK ASSESSMENT AND INTEGRATED EVALUATION OF AVAILABLE DATA, IN ANIMALS AND HUMANS
    ATTIAS, L
    CONTU, A
    LOIZZO, A
    MASSIGLIA, M
    VALENTE, P
    ZAPPONI, GA
    SCIENCE OF THE TOTAL ENVIRONMENT, 1995, 171 (1-3) : 61 - 68
  • [33] Using the Lead and Copper Rule Revisions Five-Sample Approach to Identify Schools with Increased Lead in Drinking Water Risks
    Rome, McNamara
    Estes-Smargiassi, Stephen
    Masters, Sheldon, V
    Roberson, Alan
    Tobiason, John E.
    Beighley, R. Edward
    Pieper, Kelsey J.
    ENVIRONMENTAL SCIENCE & TECHNOLOGY LETTERS, 2022, 9 (01) : 84 - 89
  • [34] Monitoring of arsenic in drinking water of high schools and assessment of carcinogenic health risk in Multan, Pakistan
    Gul, Matin
    Mashhadi, Ahmad Farooq
    Iqbal, Zafar
    Qureshi, Tahir Imran
    HUMAN AND ECOLOGICAL RISK ASSESSMENT, 2020, 26 (08): : 2129 - 2141
  • [35] An intelligent CRM system for identifying high-risk customers: An ensemble data mining approach
    Lai, Kin Keung
    Yu, Lean
    Wang, Shouyang
    Huang, Wei
    COMPUTATIONAL SCIENCE - ICCS 2007, PT 2, PROCEEDINGS, 2007, 4488 : 486 - +
  • [36] Using Compliance Data to Understand Uncertainty in Drinking Water Lead Levels in Southwestern Pennsylvania
    Schwetschenau, Sara E.
    Small, Mitchell J.
    Vanbriesen, Jeanne M.
    ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2020, 54 (14) : 8857 - 8867
  • [37] Identifying high-risk scenarios of complex systems using input domain partitioning
    Cukic, B
    Ammar, HH
    Lateef, K
    NINTH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, PROCEEDINGS, 1998, : 164 - 173
  • [38] Identifying Higher-Volume Antibiotic Outpatient Prescribers Using Publicly Available Medicare Part D Data - United States, 2019
    Gouin, Katryna A.
    Fleming-Dutra, Katherine E.
    Tsay, Sharon
    Bizune, Destani
    Hicks, Lauri A.
    Kabbani, Sarah
    MMWR-MORBIDITY AND MORTALITY WEEKLY REPORT, 2022, 71 (06): : 202 - 205
  • [39] Identifying the High-Risk Fetus in the Low-Risk Mother Using Fetal Doppler Screening
    Feucht, Ute
    Hlongwane, Tsakane
    Vannevel, Valerie
    Mulol, Helen
    Botha, Tanita
    Pattinson, Robert
    GLOBAL HEALTH-SCIENCE AND PRACTICE, 2022, 10 (03):
  • [40] Identifying higher-volume antibiotic outpatient prescribers using publicly available medicare part D data - United States, 2019
    Gouin, Katryna A.
    Fleming-Dutra, Katherine E.
    Tsay, Sharo
    Bizune, Destani
    Hicks, Lauri A.
    Kabbani, Sarah
    AMERICAN JOURNAL OF TRANSPLANTATION, 2022, 22 (04) : 1266 - 1270