Using publicly available data to predict recreational cannabis legalization at the county-level: A machine learning approach

被引:0
|
作者
Montgomery, Barrett Wallace [1 ]
Tong, Xiaoran [2 ]
Vsevolozhskaya, Olga [2 ]
Anthony, James C. [3 ]
机构
[1] Res Triangle Inst Ind, Res Triangle Pk, NC 27709 USA
[2] Univ Kentucky, Coll Publ Hlth, Dept Biostat, Res Facil, 1 111 Washington Ave, Lexington, KY 40508 USA
[3] Michigan State Univ, Coll Human Med, Dept Epidemiol & Biostat, B601 West Fee Hall,909 Wilson Rd, E Lansing, MI 48824 USA
基金
美国国家卫生研究院;
关键词
Cannabis; Drug policy; Epidemiology; Prediction; Public health law; Ensemble; Machine learning;
D O I
10.1016/j.drugpo.2024.104340
中图分类号
R194 [卫生标准、卫生检查、医药管理];
学科分类号
摘要
Background: There is substantial geographic variability in local cannabis policies within states that have legalized recreational cannabis. This study develops an interpretable machine learning model that uses county-level population demographics, sociopolitical factors, and estimates of substance use and mental illness prevalences to predict the legality of recreational cannabis sales within each U.S. county. Methods: We merged data and selected 14 model inputs from the 2010 Census, 2012 County Presidential Data from the MIT Elections Lab, and Small Area Estimates from the National Surveys on Drug Use and Health (NSDUH) from 2010 to 2012 at the county level. County policies were labeled as having recreational cannabis legal (RCL) if the sale of recreational cannabis was allowed anywhere in the county in 2014, resulting in 92 RCL and 3002 non-RCL counties. We used synthetic data augmentation and minority oversampling techniques to build an ensemble of 1000 logistic regressions on random sub-samples of the data, withholding one state at a time and building models from all remaining states. Performance was evaluated by comparing the predicted policy conditions with the actual outcomes in 2014. Results: When compared to the actual RCL policies in 2014, the ensemble estimated predictions of counties transitioning to RCL had a macro f1 average score of 0.61. The main factors associated with legalizing countylevel recreational cannabis sales were the prevalences of past-month cannabis use and past-year cocaine use. Conclusion: By leveraging publicly available data from 2010 to 2012, our model was able to achieve appreciable discrimination in predicting counties with legal recreational cannabis sales in 2014, however, there is room for improvement. Having demonstrated model performance in the first handful of states to legalize cannabis, additional testing with more recent data using time to event models is warranted.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] County-Level Poverty Evaluation Using Machine Learning, Nighttime Light, and Geospatial Data
    Zheng, Xiaoqian
    Zhang, Wenjiang
    Deng, Hui
    Zhang, Houxi
    [J]. REMOTE SENSING, 2024, 16 (06)
  • [2] County-level corn yield prediction using supervised machine learning
    Khan, Shahid Nawaz
    Khan, Abid Nawaz
    Tariq, Aqil
    Lu, Linlin
    Malik, Naeem Abbas
    Umair, Muhammad
    Hatamleh, Wesam Atef
    Zawaideh, Farah Hanna
    [J]. EUROPEAN JOURNAL OF REMOTE SENSING, 2023, 56 (01)
  • [3] Machine Learning Approach to Improve Satellite Orbit Prediction Accuracy Using Publicly Available Data
    Peng, Hao
    Bai, Xiaoli
    [J]. Journal of the Astronautical Sciences, 2020, 67 (02): : 762 - 793
  • [4] Machine Learning Approach to Improve Satellite Orbit Prediction Accuracy Using Publicly Available Data
    Hao Peng
    Xiaoli Bai
    [J]. The Journal of the Astronautical Sciences, 2020, 67 : 762 - 793
  • [5] Machine Learning Approach to Improve Satellite Orbit Prediction Accuracy Using Publicly Available Data
    Peng, Hao
    Bai, Xiaoli
    [J]. JOURNAL OF THE ASTRONAUTICAL SCIENCES, 2020, 67 (02): : 762 - 793
  • [6] Combining Optical, Fluorescence, Thermal Satellite, and Environmental Data to Predict County-Level Maize Yield in China Using Machine Learning Approaches
    Zhang, Liangliang
    Zhang, Zhao
    Luo, Yuchuan
    Cao, Juan
    Tao, Fulu
    [J]. REMOTE SENSING, 2020, 12 (01)
  • [7] INTEGRATING COUNTY-LEVEL HEALTH, HUMAN SERVICES, AND CRIMINAL JUSTICE DATA TO PREDICT RISK OF OPIOID OVERDOSE AMONG MEDICAID BENEFICIARIES: A MACHINE LEARNING APPROACH
    Lo-Ciganic, W.
    Donohue, J. M.
    Hulsey, E.
    Barnes, S.
    Li, Y.
    Kuza, C.
    Yang, Q.
    Buchanich, J.
    Huang, J.
    Gellad, W.
    [J]. VALUE IN HEALTH, 2020, 23 : S206 - S206
  • [8] Assessing county-level vulnerability to the energy transition in the United States using machine learning
    Hincapie-Ossa, Diego
    Frey, Noah
    Gingerich, Daniel B.
    [J]. ENERGY RESEARCH & SOCIAL SCIENCE, 2023, 100
  • [9] A machine learning and clustering-based approach for county-level COVID-19 analysis
    Nicholson, Charles
    Beattie, Lex
    Beattie, Matthew
    Razzaghi, Talayeh
    Chen, Sixia
    [J]. PLOS ONE, 2022, 17 (04):
  • [10] Using spatiotemporal distribution of geocoded Twitter data to predict US county-level health indices
    Thin Nguyen
    Larsen, Mark
    O'Dea, Bridianne
    Nguyen, Hung
    Nguyen, Duc Thanh
    Yearwood, John
    Dinh Phung
    Venkatesh, Svetha
    Christensen, Helen
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 110 : 620 - 628