Lexical Normalization of User-Generated Medical Forum Data

被引:0
|
作者
Dirkson, Anne [1 ]
Verberne, Suzan [1 ]
Kraaij, Wessel [1 ]
机构
[1] Leiden Univ, LIACS, Niels Bohrweg 1, Leiden, Netherlands
关键词
CORPUS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the medical domain, user-generated social media text is increasingly used as a valuable complementary knowledge source to scientific medical literature. The extraction of this knowledge is complicated by colloquial language use and misspellings. Yet, lexical normalization of such data has not been addressed properly. This paper presents an unsupervised, data-driven spelling correction module for medical social media. Our method outperforms state-of-the-art spelling correction and can detect mistakes with an F-0.5 of 0.888. Additionally, we present a novel corpus for spelling mistake detection and correction on a medical patient forum.
引用
收藏
页码:11 / 20
页数:10
相关论文
共 50 条
  • [1] Medical Concept Normalization for Online User-Generated Texts
    Lee, Kathy
    Hasan, Sadid A.
    Farri, Oladimeji
    Choudhary, Alok
    Agrawal, Ankit
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2017, : 462 - 469
  • [2] User-Generated Text Corpus for Evaluating Japanese Morphological Analysis and Lexical Normalization
    Higashiyama, Shohei
    Utiyama, Masao
    Watanabe, Taro
    Sumita, Eiichiro
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 5532 - 5541
  • [3] Deep Neural Models for Medical Concept Normalization in User-Generated Texts
    Miftahutdinov, Zulfat
    Tutubalina, Elena
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 393 - 399
  • [4] Multimodular Text Normalization of Dutch User-Generated Content
    Schulz, Sarah
    De Pauw, Guy
    De Clercq, Orphee
    Desmet, Bart
    Hoste, Veronique
    Daelemans, Walter
    Macken, Lieve
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2016, 7 (04)
  • [5] Selection of correction candidates for the normalization of Spanish user-generated content
    Melero, M.
    Costa-Jussa, M. R.
    Lambert, P.
    Quixal, M.
    [J]. NATURAL LANGUAGE ENGINEERING, 2016, 22 (01) : 135 - 161
  • [6] Occupation Profiling with User-Generated Geolocation Data
    Han, Xiaohui
    Wang, Lianhai
    Liu, Guangqi
    Zhao, Dawei
    Xu, Shujiang
    [J]. PROCEEDINGS OF 2017 2ND INTERNATIONAL CONFERENCE ON KNOWLEDGE ENGINEERING AND APPLICATIONS (ICKEA), 2017, : 93 - 97
  • [7] Preface: User-Generated Health Data and Applications
    Chen, Ching-Hua
    Ng, Kenney
    [J]. IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2018, 62 (01) : 1 - 3
  • [8] Vehicle Routing With User-Generated Trajectory Data
    Ceikute, Vaida
    Jensen, Christian S.
    [J]. 2015 16TH IEEE INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT, VOL 1, 2015, : 14 - 23
  • [9] Chronological Semantics Modeling: A Topic Evolution Approach in Online User-Generated Medical Data
    Chung, Cheng-Yu
    Hsiao, I-Han
    [J]. SOCIAL, CULTURAL, AND BEHAVIORAL MODELING, SBP-BRIMS 2019, 2019, 11549 : 103 - 112
  • [10] User-Generated Evidence
    Hamilton, Rebecca J.
    [J]. COLUMBIA JOURNAL OF TRANSNATIONAL LAW, 2019, 57 (01): : 1 - 61