Transfer Learning for Risk Classification of Social Media Posts: Model Evaluation Study

被引:22
|
作者
Howard, Derek [1 ,2 ]
Maslej, Marta M. [1 ,2 ]
Lee, Justin [3 ]
Ritchie, Jacob [1 ,4 ]
Woollard, Geoffrey [5 ,6 ]
French, Leon [1 ,2 ,7 ,8 ]
机构
[1] Ctr Addict & Mental Hlth, Campbell Family Mental Hlth Res Inst, Toronto, ON, Canada
[2] Ctr Addict & Mental Hlth, Krembil Ctr Neuroinformat, 250 Coll St, Toronto, ON M5T 1R8, Canada
[3] Univ Toronto, Dept Biochem, Toronto, ON, Canada
[4] Univ Toronto, Dept Comp Sci, Toronto, ON, Canada
[5] Univ Toronto, Dept Med Biophys, Toronto, ON, Canada
[6] Univ Hlth Network, Princess Margaret Canc Ctr, Toronto, ON, Canada
[7] Univ Toronto, Inst Med Sci, Toronto, ON, Canada
[8] Univ Toronto, Dept Psychiat, Div Brain & Therapeut, Toronto, ON, Canada
基金
加拿大创新基金会;
关键词
triage; classification; natural language processing; transfer learning; machine learning; data interpretation; statistical; mental health; social support; AGE-OF-ONSET; PEER SUPPORT; MENTAL-DISORDERS; METAANALYSIS;
D O I
10.2196/15371
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Mental illness affects a significant portion of the worldwide population. Online mental health forums can provide a supportive environment for those afflicted and also generate a large amount of data that can be mined to predict mental health states using machine learning methods. Objective: This study aimed to benchmark multiple methods of text feature representation for social media posts and compare their downstream use with automated machine learning (AutoML) tools. We tested on datasets that contain posts labeled for perceived suicide risk or moderator attention in the context of self-harm. Specifically, we assessed the ability of the methods to prioritize posts that a moderator would identify for immediate response. Methods: We used 1588 labeled posts from the Computational Linguistics and Clinical Psychology (CLPsych) 2017 shared task collected from the Reachout.com forum. Posts were represented using lexicon-based tools, including Valence Aware Dictionary and sEntiment Reasoner, Empath, and Linguistic Inquiry and Word Count, and also using pretrained artificial neural network models, including DeepMoji, Universal Sentence Encoder, and Generative Pretrained Transformer-1 (GPT-1). We used Tree-based Optimization Tool and Auto-Sklearn as AutoML tools to generate classifiers to triage the posts. Results: The top-performing system used features derived from the GPT-1 model, which was fine-tuned on over 150,000 unlabeled posts from Reachout.com . Our top system had a macroaveraged F1 score of 0.572, providing a new state-of-the-art result on the CLPsych 2017 task. This was achieved without additional information from metadata or preceding posts. Error analyses revealed that this top system often misses expressions of hopelessness. In addition, we have presented visualizations that aid in the understanding of the learned classifiers. Conclusions: In this study, we found that transfer learning is an effective strategy for predicting risk with relatively little labeled data and noted that fine-tuning of pretrained language models provides further gains when large amounts of unlabeled text are available.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Leveraging Transfer Learning for Hate Speech Detection in Portuguese Social Media Posts
    Ramos, Gil
    Batista, Fernando
    Ribeiro, Ricardo
    Fialho, Pedro
    Moro, Sergio
    Fonseca, Antonio
    Guerra, Rita
    Carvalho, Paula
    Marques, Catarina
    Silva, Claudia
    IEEE ACCESS, 2024, 12 : 101374 - 101389
  • [2] Rapid relevance classification of social media posts in disasters and emergencies: A system and evaluation featuring active, incremental and online learning
    Kaufhold, Marc-Andre
    Bayer, Markus
    Reuter, Christian
    INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (01)
  • [3] Social Media Information Classification of Earthquake Disasters Based on BERT Transfer Learning Model
    Lin, Sen
    Liu, Beibei
    Li, Jianwen
    Liu, Xu
    Qin, Kun
    Guo, Guizhen
    Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2024, 49 (09): : 1661 - 1671
  • [4] Sentiment Classification of Cryptocurrency-Related Social Media Posts
    Kulakowski, Mikolaj
    Frasincar, Flavius
    IEEE INTELLIGENT SYSTEMS, 2023, 38 (04) : 5 - 9
  • [5] Informational Query Detection on Social Media Posts in Bengali Language Using Machine Learning And Transfer Learning Techniques
    Rahman, Md. Atiqur
    Chowdhury, Sanjid Islam
    Rafan, Sadid
    Jannat, Nahian
    Aziz, Tahsin
    2023 IEEE 13TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE, CCWC, 2023, : 458 - 464
  • [6] Semisupervised transfer learning for evaluation of model classification performance
    Wang, Linshanshan
    Wang, Xuan
    Liao, Katherine P.
    Cai, Tianxi
    BIOMETRICS, 2024, 80 (01)
  • [7] A Deep Learning-Based Sentiment Classification Approach for Detecting Suicidal Ideation on Social Media Posts
    Kumar, Pabbisetty Sai Venkata Tarun
    Sisodia, Dilip Singh
    Shrivastava, Rahul
    BIOMEDICAL ENGINEERING SCIENCE AND TECHNOLOGY, ICBEST 2023, 2024, 2003 : 270 - 283
  • [8] Mental Illness Classification on Social Media Texts Using Deep Learning and Transfer Learning
    Arif, Muhammad
    Ameer, Iqra
    Bolucu, Necva
    Sidorov, Grigori
    Gelbukh, Alexander
    Elangovan, Vinnayak
    COMPUTACION Y SISTEMAS, 2024, 28 (02): : 451 - 464
  • [9] Analysis and classification of privacy-sensitive content in social media posts
    Bioglio, Livio
    Pensa, Ruggero G.
    EPJ DATA SCIENCE, 2022, 11 (01)
  • [10] Analysis and classification of privacy-sensitive content in social media posts
    Livio Bioglio
    Ruggero G. Pensa
    EPJ Data Science, 11