Machine learning tools were used in the prediction of disease prevalence (bacterial, viral, and others) based on the pollutants like inhalable particulate matter, sulfur dioxide, nitrogen dioxide, carbon monoxide, and ground ozone. Random forest (RF), quadratic discriminant analysis (QDA), k-nearest neighbors (KNN), naïve Bayes (NB), and linear discriminant analysis (LDA) models were tested among others for better prediction accuracy, kappa statistic, sensitivity, and specificity. k-Nearest neighbors and linear discriminant analysis models yielded an accuracy of 85% relatively. The best model sensitivity of 100% was obtained with the k-nearest neighbor model, and a moderate kappa statistic was gained by the LDA model. As far as the model specificity is concerned, QDA yielded a value of 100%. Geographically weighted regression was applied to know the effect of spatial component across the data, and we obtained R2 value of 0.63 with a moderate Akaike Information Criterion along with a minimal condition number reflecting the stability of the model. The disease prevalence variable was classified into high and low levels and was fed into the ML framework. The risk/susceptibility maps were produced with relative weights, and spatial distribution maps were presented. We conclude that though the ML and geographic information system–based tools can be used invariably, sufficient data is essential to generate a model with higher accuracy in terms of evaluation metrics, and geographically weighted regression at multiscale can also aid in knowing the characteristics of the model performance.