Automatic Construction of a Depression-Domain Lexicon Based on Microblogs: Text Mining Study

被引：16

作者：

Li, Genghao ^{[1
]}

Li, Bing ^{[1
]}

Huang, Langlin ^{[1
]}

Hou, Sibing ^{[2
]}

机构：

[1] Univ Int Business & Econ, Sch Informat Technol & Management, Huixin East St, Beijing 100029, Peoples R China

[2] Columbia Univ, Grad Sch Art & Sci, New York, NY USA

来源：

JMIR MEDICAL INFORMATICS | 2020年 / 8卷 / 06期

关键词：

depression detection; depression diagnosis; social media; automatic construction; domain-specific lexicon; depression lexicon; label propagation;

D O I：

10.2196/17650

中图分类号：

R-058 [];

学科分类号：

摘要：

Background: According to a World Health Organization report in 2017, there was almost one patient with depression among every 20 people in China. However, the diagnosis of depression is usually difficult in terms of clinical detection owing to slow observation, high cost, and patient resistance. Meanwhile, with the rapid emergence of social networking sites, people tend to share their daily life and disclose inner feelings online frequently, making it possible to effectively identify mental conditions using the rich text information. There are many achievements regarding an English web-based corpus, but for research in China so far, the extraction of language features from web-related depression signals is still in a relatively primary stage. Objective: The purpose of this study was to propose an effective approach for constructing a depression-domain lexicon. This lexicon will contain language features that could help identify social media users who potentially have depression. Our study also compared the performance of detection with and without our lexicon. Methods: We autoconstructed a depression-domain lexicon using Word2Vec, a semantic relationship graph, and the label propagation algorithm. These two methods combined performed well in a specific corpus during construction. The lexicon was obtained based on 111,052 Weibo microblogs from 1868 users who were depressed or nondepressed. During depression detection, we considered six features, and we used five classification methods to test the detection performance. Results: The experiment results showed that in terms of the F1 value, our autoconstruction method performed 1% to 6% better than baseline approaches and was more effective and steadier. When applied to detection models like logistic regression and support vector machine, our lexicon helped the models outperform by 2% to 9% and was able to improve the final accuracy of potential depression detection. Conclusions: Our depression-domain lexicon was proven to be a meaningful input for classification algorithms, providing linguistic insights on the depressive status of test subjects. We believe that this lexicon will enhance early depression detection in people on social media. Future work will need to be carried out on a larger corpus and with more complex methods.

引用

页数：17

共 50 条

[21] Automatic Rule Definition for Pattern-Based Text Mining
Kuriu, Minoki
Mendonca, Israel
Aritsugi, Masayoshi
2023 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, BIGCOMP, 2023, : 187 - 194
[22] How unsafe acts occur: an automatic text mining study
Shi, Xin
Xu, Dong
Zhuang, Hui
Liu, Chen
MARITIME POLICY & MANAGEMENT, 2022, 49 (06) : 820 - 830
[23] Study on the construction of domain text classification model with the help of domain knowledge
Yy, Zheng-Tao
Han, Lu
Guo, Jian-Yi
Meng, Xiang-Yan
Zhang, Zhi-Kun
PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 2612 - +
[24] Semi-Automatic Construction of Text-to-SQL Data for Domain Transfer
Li, Tianyi
Li, Sujian
Steedman, Mark
IWPT 2021: THE 17TH INTERNATIONAL CONFERENCE ON PARSING TECHNOLOGIES: PROCEEDINGS OF THE CONFERENCE (INCLUDING THE IWPT 2021 SHARED TASK), 2021, : 38 - 49
[25] Impact for whom? Mapping the users of public research with lexicon-based text mining
Andrea Bonaccorsi
Filippo Chiarello
Gualtiero Fantoni
Scientometrics, 2021, 126 : 1745 - 1774
[26] Impact for whom? Mapping the users of public research with lexicon-based text mining
Bonaccorsi, Andrea
Chiarello, Filippo
Fantoni, Gualtiero
SCIENTOMETRICS, 2021, 126 (02) : 1745 - 1774
[27] Automatic construction of gene relation networks using text mining and gene expression data
Karopka, T
Scheel, T
Bansemer, S
Glass, Ä
MEDICAL INFORMATICS AND THE INTERNET IN MEDICINE, 2004, 29 (02): : 169 - 183
[28] Automatic Construction of Domain and Aspect Specific Sentiment Lexicons for Customer Review Mining
Bross, Juergen
Ehrig, Heiko
PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1077 - 1086
[29] Construction of ontology-based software repositories by text mining
Wu, Yan
Siy, Harvey
Zand, Mansour
Winter, Victor
COMPUTATIONAL SCIENCE - ICCS 2007, PT 3, PROCEEDINGS, 2007, 4489 : 790 - +
[30] Automatic construction of knowledge graph based on massive text data
Zhu X.-L.
Xie Z.
Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2021, 51 (04): : 1358 - 1363

← 1 2 3 4 5 →