Automatic Construction of a Depression-Domain Lexicon Based on Microblogs: Text Mining Study

被引:16
|
作者
Li, Genghao [1 ]
Li, Bing [1 ]
Huang, Langlin [1 ]
Hou, Sibing [2 ]
机构
[1] Univ Int Business & Econ, Sch Informat Technol & Management, Huixin East St, Beijing 100029, Peoples R China
[2] Columbia Univ, Grad Sch Art & Sci, New York, NY USA
关键词
depression detection; depression diagnosis; social media; automatic construction; domain-specific lexicon; depression lexicon; label propagation;
D O I
10.2196/17650
中图分类号
R-058 [];
学科分类号
摘要
Background: According to a World Health Organization report in 2017, there was almost one patient with depression among every 20 people in China. However, the diagnosis of depression is usually difficult in terms of clinical detection owing to slow observation, high cost, and patient resistance. Meanwhile, with the rapid emergence of social networking sites, people tend to share their daily life and disclose inner feelings online frequently, making it possible to effectively identify mental conditions using the rich text information. There are many achievements regarding an English web-based corpus, but for research in China so far, the extraction of language features from web-related depression signals is still in a relatively primary stage. Objective: The purpose of this study was to propose an effective approach for constructing a depression-domain lexicon. This lexicon will contain language features that could help identify social media users who potentially have depression. Our study also compared the performance of detection with and without our lexicon. Methods: We autoconstructed a depression-domain lexicon using Word2Vec, a semantic relationship graph, and the label propagation algorithm. These two methods combined performed well in a specific corpus during construction. The lexicon was obtained based on 111,052 Weibo microblogs from 1868 users who were depressed or nondepressed. During depression detection, we considered six features, and we used five classification methods to test the detection performance. Results: The experiment results showed that in terms of the F1 value, our autoconstruction method performed 1% to 6% better than baseline approaches and was more effective and steadier. When applied to detection models like logistic regression and support vector machine, our lexicon helped the models outperform by 2% to 9% and was able to improve the final accuracy of potential depression detection. Conclusions: Our depression-domain lexicon was proven to be a meaningful input for classification algorithms, providing linguistic insights on the depressive status of test subjects. We believe that this lexicon will enhance early depression detection in people on social media. Future work will need to be carried out on a larger corpus and with more complex methods.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Research on Organizational Knowledge Structure's Construction Based on Text Mining
    Qiu, Jiangnan
    Nian, Chuangling
    ELECTRONIC-BUSINESS INTELLIGENCE: FOR CORPORATE COMPETITIVE ADVANTAGES IN THE AGE OF EMERGING TECHNOLOGIES & GLOBALIZATION, 2010, 14 : 403 - 410
  • [42] Identification of Construction Safety Risks Based on Text Mining and LIBSVM Method
    Xu, Yuqing
    Wang, Guangbin
    Xia, Chen
    Cao, Dongping
    CONSTRUCTION RESEARCH CONGRESS 2020: SAFETY, WORKFORCE, AND EDUCATION, 2020, : 40 - 48
  • [43] TEXT MINING-BASED PATENT ANALYSIS OF BIM APPLICATION IN CONSTRUCTION
    Pan, Xing
    Zhong, Botao
    Wang, Xiaobo
    Xiang, Ran
    JOURNAL OF CIVIL ENGINEERING AND MANAGEMENT, 2021, 27 (05) : 303 - 315
  • [44] A Correlation Analysis of Construction Site Fall Accidents Based on Text Mining
    Luo, Xixi
    Liu, Quanlong
    Qiu, Zunxiang
    FRONTIERS IN BUILT ENVIRONMENT, 2021, 7
  • [45] Domain Based Semantic Compression for Automatic Text Comprehension Augmentation and Recommendation
    Ceglarek, Dariusz
    Haniewicz, Konstanty
    Rutkowski, Wojciech
    COMPUTATIONAL COLLECTIVE INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS, PT II: THIRD INTERNATIONAL CONFERENCE, ICCCI 2011, 2011, 6923 : 40 - +
  • [46] Automatic recommendation of prognosis measures for mechanical components based on massive text mining
    Martinez-Gil, Jorge
    Freudenthaler, Bernhard
    Natschlaeger, Thomas
    INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2018, 14 (04) : 480 - 494
  • [47] Automatic Recommendation of Prognosis Measures for Mechanical Components based on Massive Text Mining
    Martinez-Gil, Jorge
    Freudenthaler, Bernhard
    Natschlaeger, Thomas
    19TH INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES (IIWAS2017), 2017, : 32 - 39
  • [48] RETRACTED: Text Mining Based on the Lexicon-Constrained Network in the Context of Big Data (Retracted Article)
    Wan, Boyan
    Sohail, Mishal
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [49] A TOOL SUPPORTING MINING BASED APPROACH SELECTION TO AUTOMATIC ONTOLOGY CONSTRUCTION
    Konys, Agnieszka
    PROCEEDINGS OF THE EUROPEAN CONFERENCE ON DATA MINING 2015 AND INTERNATIONAL CONFERENCES ON INTELLIGENT SYSTEMS AND AGENTS 2015 AND THEORY AND PRACTICE IN MODERN COMPUTING 2015, 2015, : 3 - 10
  • [50] Semi-automatic construction of ontology based on data mining technique
    Wang, Jingyun
    Flanagan, Brendan
    Ogata, Hiroaki
    2017 6TH IIAI INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS (IIAI-AAI), 2017, : 511 - 515