Lexical Normalization of User-Generated Medical Forum Data

被引:0
|
作者
Dirkson, Anne [1 ]
Verberne, Suzan [1 ]
Kraaij, Wessel [1 ]
机构
[1] Leiden Univ, LIACS, Niels Bohrweg 1, Leiden, Netherlands
关键词
CORPUS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the medical domain, user-generated social media text is increasingly used as a valuable complementary knowledge source to scientific medical literature. The extraction of this knowledge is complicated by colloquial language use and misspellings. Yet, lexical normalization of such data has not been addressed properly. This paper presents an unsupervised, data-driven spelling correction module for medical social media. Our method outperforms state-of-the-art spelling correction and can detect mistakes with an F-0.5 of 0.888. Additionally, we present a novel corpus for spelling mistake detection and correction on a medical patient forum.
引用
收藏
页码:11 / 20
页数:10
相关论文
共 50 条
  • [41] Prototyping sustainable mobility practices: user-generated data in the smart city
    Valdez, Alan-Miguel
    Cook, Matthew
    Langendahl, Per-Anders
    Roby, Helen
    Potter, Stephen
    [J]. TECHNOLOGY ANALYSIS & STRATEGIC MANAGEMENT, 2018, 30 (02) : 144 - 157
  • [42] Design of A State Machine towards Efficient Management of User-Generated Data
    Yen, Neil Y.
    Huang, Runhe
    Ma, Jianhua
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2014, : 700 - 704
  • [43] Tapping the grapevine: User-generated content
    Figallo, C
    Rhine, N
    [J]. ECONTENT, 2001, 24 (03) : 38 - +
  • [44] Multimodal Summarization of User-Generated Videos
    Psallidas, Theodoros
    Koromilas, Panagiotis
    Giannakopoulos, Theodoros
    Spyrou, Evaggelos
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (11):
  • [45] Research on Landscape Perception of Urban Parks Based on User-Generated Data
    Ren, Wei
    Zhan, Kaiyuan
    Chen, Zhu
    Hong, Xin-Chen
    [J]. Buildings, 2024, 14 (09)
  • [46] Deep learning based sentiment classification on user-generated big data
    Kumar, Akshi
    Jaiswal, Arunima
    [J]. Jaiswal, Arunima (arunimajaiswal@gmail.com), 1600, Bentham Science Publishers (13): : 1047 - 1056
  • [47] Bangkok Tours and Activities Data Analysis via User-Generated Content
    Chugh, Naina
    Phumchusri, Naragain
    [J]. 2020 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRONICS & COMMUNICATIONS ENGINEERING (ICCECE, 2020, : 98 - 102
  • [48] Extraction of semantic relations in noisy user-generated law enforcement data
    Schraagen, Marijn
    Bex, Floris
    [J]. 2019 13TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2019, : 79 - 86
  • [49] Tree-based data filtering for online user-generated reviews
    Liang, Qiao
    [J]. IISE TRANSACTIONS, 2024, 56 (08) : 824 - 840
  • [50] Confidential Boosting with Random Linear Classifiers for Outsourced User-Generated Data
    Sharma, Sagar
    Chen, Keke
    [J]. COMPUTER SECURITY - ESORICS 2019, PT I, 2019, 11735 : 41 - 65