Development and validation of a rheumatoid arthritis case definition: a machine learning approach using data from primary care electronic medical records

被引:0
|
作者
Pham, Anh N. Q. [1 ,2 ,3 ,4 ]
Barber, Claire E. H. [2 ,3 ]
Drummond, Neil [2 ,3 ,5 ]
Jasper, Lisa [6 ]
Klein, Doug [5 ]
Lindeman, Cliff [7 ]
Widdifield, Jessica [8 ,9 ]
Williamson, Tyler [2 ,3 ]
Jones, C. Allyson [6 ]
机构
[1] Simon Fraser Univ, Dept Hlth Sci, Burnaby, BC, Canada
[2] Univ Calgary, Dept Med, Calgary, AB, Canada
[3] Univ Calgary, Dept Community Hlth Sci, Calgary, AB, Canada
[4] Simon Fraser Univ, Pacific Inst Pathogen Pandem & Soc, Burnaby, BC, Canada
[5] Univ Alberta, Dept Family Med, Edmonton, AB, Canada
[6] Univ Alberta, Fac Rehabil Med, Edmonton, AB, Canada
[7] Coll Phys & Surg Alberta, Edmonton, AB, Canada
[8] Sunnybrook Res Inst, Holland Bone & Joint Res Program, Toronto, ON, Canada
[9] Univ Toronto, Inst Hlth Policy Management & Evaluat, ICES, Toronto, ON, Canada
关键词
Rheumatoid arthritis; Case definition; EMR phenotyping; Electronic medical records; Machine learning; SURVEILLANCE;
D O I
10.1186/s12911-024-02776-w
中图分类号
R-058 [];
学科分类号
摘要
BackgroundRheumatoid Arthritis (RA) is a chronic inflammatory disease that is primarily diagnosed and managed by rheumatologists; however, it is often primary care providers who first encounter RA-related symptoms. This study developed and validated a case definition for RA using national surveillance data in primary care settings.MethodsThis cross-sectional validation study used structured electronic medical record (EMR) data from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN). Based on the reference set generated by EMR reviews by five experts, three machine learning steps: 'bag-of-words' approach to feature generation, feature reduction using a feature importance measure coupled with recursive feature elimination and clustering, and classification using tree-based methods (Decision Tree, Random Forest, and Extreme Gradient Boosting). The three tree-based algorithms were compared to identify the procedure that generated the optimal evaluation metrics. Nested cross-validation was used to allow evaluation and comparison and tuning of models simultaneously.ResultsOf 1.3 million patients from seven Canadian provinces, 5,600 people aged 19 + were randomly selected. The optimal algorithm for selecting RA cases was generated by the XGBoost classification method. Based on feature importance scores for features in the XGBoost output, a human-readable case definition was created, where RA cases are identified when there are at least 2 occurrences of text "rheumatoid" in any billing, encounter diagnosis, or health condition table of the patient chart. The final case definition had sensitivity of 81.6% (95% CI, 75.6-86.4), specificity of 98.0% (95% CI, 97.4-98.5), positive predicted value of 76.3% (95% CI, 70.1-81.5), and negative predicted value of 98.6% (95% CI, 98.0-98.6).ConclusionA case definition for RA in using primary care EMR data was developed based off the XGBoost algorithm. With high validity metrics, this case definition is expected to be a reliable tool for future epidemiological research and surveillance investigating the management of RA in CPCSSN dataset.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Development and validation of a case definition for problematic menopause in primary care electronic medical records
    Pham, Anh N. Q.
    Cummings, Michael
    Yuksel, Nese
    Sydora, Beate
    Williamson, Tyler
    Garies, Stephanie
    Pilling, Russell
    Lindeman, Cliff
    Ross, Sue
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2023, 23 (01)
  • [2] Development and validation of a case definition for problematic menopause in primary care electronic medical records
    Anh N.Q. Pham
    Michael Cummings
    Nese Yuksel
    Beate Sydora
    Tyler Williamson
    Stephanie Garies
    Russell Pilling
    Cliff Lindeman
    Sue Ross
    BMC Medical Informatics and Decision Making, 23
  • [3] Correction: Development and validation of a case definition for problematic menopause in primary care electronic medical records
    Anh N. Q. Pham
    Michael Cummings
    Nese Yuksel
    Beate Sydora
    Tyler Williamson
    Stephanie Garies
    Russell Pilling
    Cliff Lindeman
    Sue Ross
    BMC Medical Informatics and Decision Making, 23 (1)
  • [4] Defining Disease Phenotypes in Primary Care Electronic Health Records by a Machine Learning Approach: A Case Study in Identifying Rheumatoid Arthritis
    Zhou, Shang-Ming
    Fernandez-Gutierrez, Fabiola
    Kennedy, Jonathan
    Cooksey, Roxanne
    Atkinson, Mark
    Denaxas, Spiros
    Siebert, Stefan
    Dixon, William G.
    O'Neill, Terence W.
    Choy, Ernest
    Sudlow, Cathie
    Brophy, Sinead
    PLOS ONE, 2016, 11 (05):
  • [5] IDENTIFYING PATIENTS WITH RHEUMATOID ARTHRITIS IN PRIMARY CARE ELECTRONIC MEDICAL RECORDS
    Widdifield, J.
    Young, J.
    Bombardier, C.
    Jaakkimainen, R. L.
    Butt, D.
    Ivers, N.
    Bernatsky, S.
    Paterson, J. M.
    Thorne, J. C.
    Ahluwalia, V.
    Tomlinson, G.
    Tu, K.
    ANNALS OF THE RHEUMATIC DISEASES, 2014, 73 : 452 - 453
  • [6] Supplementing Claims Data with Electronic Medical Records to Improve Estimation and Classification of Rheumatoid Arthritis Disease Activity: A Machine Learning Approach
    Feldman, Candace H.
    Yoshida, Kazuki
    Xu, Chang
    Frits, Michelle L.
    Shadick, Nancy A.
    Weinblatt, Michael E.
    Connolly, Sean E.
    Alemao, Evo
    Solomon, Daniel H.
    ACR OPEN RHEUMATOLOGY, 2019, 1 (09) : 552 - 559
  • [7] Validating a case definition for adult asthma in primary care electronic medical records
    Andrew J. Cave
    Boglarka Soos
    Christina Gillies
    Neil Drummond
    Anh N. Q. Pham
    Tyler Williamson
    npj Primary Care Respiratory Medicine, 30
  • [8] Validating a case definition for adult asthma in primary care electronic medical records
    Cave, Andrew J.
    Soos, Boglarka
    Gillies, Christina
    Drummond, Neil
    Pham, Anh N. Q.
    Williamson, Tyler
    NPJ PRIMARY CARE RESPIRATORY MEDICINE, 2020, 30 (01)
  • [9] Validation of a primary care electronic medical records case definition for eczema: retrospective cross-sectional study
    Hannah Stirton
    Leanne Kosowan
    Elissa M Abrams
    Jennifer LP Protudjer
    John Queenan
    Alexander Singer
    Allergy, Asthma & Clinical Immunology, 19
  • [10] Validation of a primary care electronic medical records case definition for eczema: retrospective cross-sectional study
    Stirton, Hannah
    Kosowan, Leanne
    Abrams, Elissa M.
    Protudjer, Jennifer L. P.
    Queenan, John
    Singer, Alexander
    ALLERGY ASTHMA AND CLINICAL IMMUNOLOGY, 2023, 19 (01):