Automatic Construction of a Depression-Domain Lexicon Based on Microblogs: Text Mining Study

被引:16
|
作者
Li, Genghao [1 ]
Li, Bing [1 ]
Huang, Langlin [1 ]
Hou, Sibing [2 ]
机构
[1] Univ Int Business & Econ, Sch Informat Technol & Management, Huixin East St, Beijing 100029, Peoples R China
[2] Columbia Univ, Grad Sch Art & Sci, New York, NY USA
关键词
depression detection; depression diagnosis; social media; automatic construction; domain-specific lexicon; depression lexicon; label propagation;
D O I
10.2196/17650
中图分类号
R-058 [];
学科分类号
摘要
Background: According to a World Health Organization report in 2017, there was almost one patient with depression among every 20 people in China. However, the diagnosis of depression is usually difficult in terms of clinical detection owing to slow observation, high cost, and patient resistance. Meanwhile, with the rapid emergence of social networking sites, people tend to share their daily life and disclose inner feelings online frequently, making it possible to effectively identify mental conditions using the rich text information. There are many achievements regarding an English web-based corpus, but for research in China so far, the extraction of language features from web-related depression signals is still in a relatively primary stage. Objective: The purpose of this study was to propose an effective approach for constructing a depression-domain lexicon. This lexicon will contain language features that could help identify social media users who potentially have depression. Our study also compared the performance of detection with and without our lexicon. Methods: We autoconstructed a depression-domain lexicon using Word2Vec, a semantic relationship graph, and the label propagation algorithm. These two methods combined performed well in a specific corpus during construction. The lexicon was obtained based on 111,052 Weibo microblogs from 1868 users who were depressed or nondepressed. During depression detection, we considered six features, and we used five classification methods to test the detection performance. Results: The experiment results showed that in terms of the F1 value, our autoconstruction method performed 1% to 6% better than baseline approaches and was more effective and steadier. When applied to detection models like logistic regression and support vector machine, our lexicon helped the models outperform by 2% to 9% and was able to improve the final accuracy of potential depression detection. Conclusions: Our depression-domain lexicon was proven to be a meaningful input for classification algorithms, providing linguistic insights on the depressive status of test subjects. We believe that this lexicon will enhance early depression detection in people on social media. Future work will need to be carried out on a larger corpus and with more complex methods.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] Semi-automatic ontology construction based on text learning
    Wang, Ying
    Zuo, Wanli
    Peng, Tao
    Sun, Yifei
    Journal of Information and Computational Science, 2010, 7 (02): : 495 - 501
  • [32] AUTOMATIC ONTOLOGY CONSTRUCTION IN FICTION-BASED DOMAIN
    Goh, Hui-Ngo
    Kiu, Ching-Chieh
    Soon, Lay-Ki
    Ranaivo-Malancon, Bali
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2011, 21 (08) : 1147 - 1167
  • [33] Sentiment Orientation Analysis of Short Text Based on Background and Domain Sentiment Lexicon Expansion
    Ma, Lu
    Zhang, Dan
    Yang, Jian-wu
    Luo, Xiong
    PROCEEDINGS OF 2016 5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), 2016, : 234 - 239
  • [34] Text Mining for Employee Candidates Automatic Profiling Based on Application Documents
    Wibawa, Adhi Dharma
    Amri, Arni Muarifah
    Mas, Arbintoro
    Iman, Syahrul
    EMITTER-INTERNATIONAL JOURNAL OF ENGINEERING TECHNOLOGY, 2022, 10 (01) : 47 - 62
  • [35] Text mining based an automatic model for software vulnerability severity prediction
    Malhotra, Ruchika
    Vidushi
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (08) : 3706 - 3724
  • [36] Building an index of nanomedical resources:: An automatic approach based on text mining
    Chiesa, Stefano
    Garcia-Remesal, Miguel
    de la Calle, Guillermo
    de la Iglesia, Diana
    Bankauskaite, Vaida
    Maojo, Victor
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 2, PROCEEDINGS, 2008, 5178 : 50 - +
  • [37] Towards High Performance Text Mining: A TextRank-based Method for Automatic Text Summarization
    Yu, Shanshan
    Su, Jindian
    Li, Pengfei
    Wang, Hao
    INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2016, 8 (02) : 58 - 75
  • [38] Social Choice Theory Based Domain Specific Hindi Stop Words List Construction and Its Application in Text Mining
    Rani, Ruby
    Lobiyal, D. K.
    INTELLIGENT HUMAN COMPUTER INTERACTION, 2018, 11278 : 123 - 135
  • [39] Study on Topic Evolution based on Text Mining
    Wang, Jinlong
    Geng, Xueyu
    Gao, Ke
    Li, Lan
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 509 - +
  • [40] Enriching Domain Concepts with Qualitative Attributes: A Text Mining based Approach
    Behera, Niyati Kumari
    Mahalakshmi, Guruvayur Suryanarayanan
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2020, 17 (06) : 916 - 925