Large-scale Analysis of Free-Text Data for Mental Health Surveillance with Topic Modelling

被引:0
|
作者
Gu, Yang [1 ]
Leroy, Gondy [1 ]
机构
[1] Univ Arizona, Tucson, AZ 85721 USA
来源
关键词
Natural language processing; NLP; healthcare analytics; topic modelling; LDA; autism; ASD;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Autism spectrum disorder (ASD) affects 1 in 59 children in the US and costs the US economy $66 billion annually. The Center for Disease Control and Prevention (CDC) has collected a large set of EHR as part of surveillance in the US. In Arizona, the dataset contains 4480 EHR with 10 million free text tokens over ten years. It contains detailed descriptions of children with ASD-like behaviors. While the knowledge about and the diagnostic criteria of ASD have evolved, the data collected from earlier years have not been re-evaluated. To more efficiently leverage this data and uncover causes for the increase in ASD prevalence observed in epidemiological surveillance, we use Latent Dirichlet Allocation (LDA) to analyze the content of the text data automatically. Preliminary results suggest LDA can model topics in EHR content and show variations in content that are consistent with changes in the data collection effort.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Topic modeling for large-scale text data
    Li, Xi-ming
    Ouyang, Ji-hong
    Lu, You
    [J]. FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2015, 16 (06) : 457 - 465
  • [2] Topic modeling for large-scale text data
    Xi-ming Li
    Ji-hong Ouyang
    You Lu
    [J]. Frontiers of Information Technology & Electronic Engineering, 2015, 16 : 457 - 465
  • [3] A Large-scale Text Analysis with Word Embeddings and Topic Modeling
    Choi, Won-Joon
    Kim, Euhee
    [J]. JOURNAL OF COGNITIVE SCIENCE, 2019, 20 (01) : 147 - 187
  • [4] Correction to: Guidelines: a structural topic modelling analysis of free-text data from 17,500 UK adults
    Liam Wright
    Elise Paul
    Andrew Steptoe
    Daisy Fancourt
    [J]. BMC Public Health, 22
  • [5] Big Data, Large-Scale Text Analysis, and Public Health Research
    Chowkwanyun, Merlin
    [J]. AMERICAN JOURNAL OF PUBLIC HEALTH, 2019, 109 : 5126 - 5127
  • [6] Empath: Understanding Topic Signals in Large-Scale Text
    Fast, Ethan
    Chen, Binbin
    Bernstein, Michael S.
    [J]. 34TH ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2016, 2016, : 4647 - 4657
  • [7] A Distributed Topic Model for Large-Scale Streaming Text
    Li, Yicong
    Feng, Dawei
    Lu, Menglong
    Li, Dongsheng
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2019, PT II, 2019, 11776 : 37 - 48
  • [8] Guidelines: a structural topic modelling analysis of free-text data from 17,500 UK adults (vol 22, 34, 2022)
    Wright, Liam
    Paul, Elise
    Steptoe, Andrew
    Fancourt, Daisy
    [J]. BMC PUBLIC HEALTH, 2022, 22 (01)
  • [9] Linear Discriminant Analysis for Large-Scale data : Application on Text and Image data
    Elhadji Ille Gado, Nassara
    Grall-Maes, Edith
    Kharouf, Malika
    [J]. 2016 15TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2016), 2016, : 961 - 964
  • [10] Facilitators and barriers to compliance with COVID-19 guidelines: a structural topic modelling analysis of free-text data from 17,500 UK adults
    Wright, Liam
    Paul, Elise
    Steptoe, Andrew
    Fancourt, Daisy
    [J]. BMC PUBLIC HEALTH, 2022, 22 (01)