Automated classification of lay health articles using natural language processing: a case study on pregnancy health and postpartum depression

被引:2
|
作者
Patra, Braja Gopal [1 ]
Sun, Zhaoyi [1 ]
Cheng, Zilin [1 ]
Kumar, Praneet Kasi Reddy Jagadeesh [1 ]
Altammami, Abdullah [1 ]
Liu, Yiyang [1 ]
Joly, Rochelle [2 ]
Jedlicka, Caroline [3 ,4 ]
Delgado, Diana [4 ]
Pathak, Jyotishman [1 ]
Peng, Yifan [1 ]
Zhang, Yiye [1 ]
机构
[1] Weill Cornell Med, Dept Populat Hlth Sci, New York, NY 10065 USA
[2] Weill Cornell Med, Dept Obstet & Gynecol, New York, NY USA
[3] CUNY, Kingsborough Community Coll, New York, NY USA
[4] Weill Cornell Med, Samuel J Wood Lib & CV Starr Biomed Informat Ctr, New York, NY USA
来源
FRONTIERS IN PSYCHIATRY | 2023年 / 14卷
关键词
online health information; health communication; natural language processing; pregnancy; postpartum depression; INTERNET; STRESS;
D O I
10.3389/fpsyt.2023.1258887
中图分类号
R749 [精神病学];
学科分类号
100205 ;
摘要
ObjectiveEvidence suggests that high-quality health education and effective communication within the framework of social support hold significant potential in preventing postpartum depression. Yet, developing trustworthy and engaging health education and communication materials requires extensive expertise and substantial resources. In light of this, we propose an innovative approach that involves leveraging natural language processing (NLP) to classify publicly accessible lay articles based on their relevance and subject matter to pregnancy and mental health.Materials and methodsWe manually reviewed online lay articles from credible and medically validated sources to create a gold standard corpus. This manual review process categorized the articles based on their pertinence to pregnancy and related subtopics. To streamline and expand the classification procedure for relevance and topics, we employed advanced NLP models such as Random Forest, Bidirectional Encoder Representations from Transformers (BERT), and Generative Pre-trained Transformer model (gpt-3.5-turbo).ResultsThe gold standard corpus included 392 pregnancy-related articles. Our manual review process categorized the reading materials according to lifestyle factors associated with postpartum depression: diet, exercise, mental health, and health literacy. A BERT-based model performed best (F1 = 0.974) in an end-to-end classification of relevance and topics. In a two-step approach, given articles already classified as pregnancy-related, gpt-3.5-turbo performed best (F1 = 0.972) in classifying the above topics.DiscussionUtilizing NLP, we can guide patients to high-quality lay reading materials as cost-effective, readily available health education and communication sources. This approach allows us to scale the information delivery specifically to individuals, enhancing the relevance and impact of the materials provided.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Automated Classification of NASA Anomalies Using Natural Language Processing Techniques
    Falessi, Davide
    Layman, Lucas
    2013 IEEE INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING WORKSHOPS (ISSREW), 2013, : 5 - 6
  • [2] Automated Derivation of Diagnostic Criteria for Lung Cancer using Natural Language Processing on Electronic Health Records: A pilot study
    Houston, Andrew
    Williams, Sophie
    Ricketts, William
    Gutteridge, Charles
    Tackaberry, Chris
    Simon, Marcus
    Conibear, John
    LUNG CANCER, 2024, 190
  • [3] Automated derivation of diagnostic criteria for lung cancer using natural language processing on electronic health records: a pilot study
    Houston, Andrew
    Williams, Sophie
    Ricketts, William
    Gutteridge, Charles
    Tackaberry, Chris
    Conibear, John
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)
  • [4] Automated identification of patients with syncope in the textual health record - a feasibility study using machine learning and natural language processing
    Brekke, P.
    Pilan, I
    Husby, H.
    Gundersen, T.
    Dahl, F. A.
    Hurlen, P.
    Nytroe, O. E.
    Ovrelid, L.
    EUROPEAN HEART JOURNAL, 2020, 41 : 723 - 723
  • [5] Detection of Depression Severity Using Bengali Social Media Posts on Mental Health: Study Using Natural Language Processing Techniques
    Kabir, Muhammad Khubayeeb
    Islam, Maisha
    Kabir, Anika Nahian Binte
    Haque, Adiba
    Rhaman, Md Khalilur
    JMIR FORMATIVE RESEARCH, 2022, 6 (09)
  • [6] Incorporating natural language processing to improve classification of axial spondyloarthritis using electronic health records
    Zhao, Sizheng Steven
    Hong, Chuan
    Cai, Tianrun
    Xu, Chang
    Huang, Jie
    Ermann, Joerg
    Goodson, Nicola J.
    Solomon, Daniel H.
    Cai, Tianxi
    Liao, Katherine P.
    RHEUMATOLOGY, 2020, 59 (05) : 1059 - 1065
  • [7] Automated Genre Classification of Books Using Machine Learning and Natural Language Processing
    Gupta, Shikha
    Agarwal, Mohit
    Jain, Satbir
    2019 9TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2019), 2019, : 269 - 272
  • [8] Automated Extraction of Stroke Severity From Unstructured Electronic Health Records Using Natural Language Processing
    Fernandes, Marta
    Westover, M. Brandon
    Singhal, Aneesh B.
    Zafar, Sahar F.
    JOURNAL OF THE AMERICAN HEART ASSOCIATION, 2024, 13 (21):
  • [9] AUTOMATED, ACCURATE IDENTIFICATION OF VENTRICULAR TACHYCARDIA FROM ELECTRONIC HEALTH RECORDS USING NATURAL LANGUAGE PROCESSING
    Brennan, Kelly
    Azizi, Zahra
    Feng, Ruibin
    Goyal, Jatin
    Liu, Xichong
    Ganesan, Prasanth
    Ruiperez-Campillo, Samuel
    Baykaner, Tina
    Badhwar, Nitish
    John, Roy M.
    Viswanathan, Mohan
    Perino, Alexander
    Wang, Paul J.
    Rogers, Albert J.
    Narayan, Sanjiv M.
    JOURNAL OF THE AMERICAN COLLEGE OF CARDIOLOGY, 2024, 83 (13) : 2644 - 2644
  • [10] Facilitating pharmacogenetic studies using electronic health records and natural-language processing: a case study of warfarin
    Xu, Hue
    Jiang, Min
    Oetjens, Matt
    Bowton, Erica A.
    Ramirez, Andrea H.
    Jeff, Janina M.
    Basford, Melissa A.
    Pulley, Jill M.
    Cowan, James D.
    Wang, Xiaoming
    Ritchie, Marylyn D.
    Masys, Daniel R.
    Roden, Dan M.
    Crawford, Dana C.
    Denny, Joshua C.
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2011, 18 (04) : 387 - 391