Weakly labeled data augmentation for social media named entity recognition

被引:4
|
作者
Kim, Juae [1 ]
Kim, Yejin [2 ]
Kang, Sangwoo [3 ]
机构
[1] AIRS Co, Hyundai Motor Grp, Seoul 06620, South Korea
[2] George Washington Univ, Dept Comp Sci, Graph Lab, Washington, DC 20037 USA
[3] Gachon Univ, Sch Comp, Gyeonggi Do 13120, South Korea
基金
新加坡国家研究基金会;
关键词
Named entity recognition; Social-media text mining; Weakly labeled data; Transfer learning;
D O I
10.1016/j.eswa.2022.118217
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Named entity recognition is a task that extracts entities corresponding to predefined categories. Although NER is important in processing user-generated texts such as those obtained from social media, it remains challenging because such texts tend to contain numerous unseen words or abbreviations. To address this issue, we propose two methods for weakly labeled data generation that can extract named entities from social media texts more effectively: alias augmentation and typo augmentation. Using these methods, weakly labeled data are generated through the automatic annotation of unlabeled Wikipedia texts and Tweets and then trained through transfer learning. Our experimental results suggest that the proposed approach improves NER performance, with our best F1-score of 51.43% representing the highest score ever reported.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] EPT: Data Augmentation with Embedded Prompt Tuning for LowResource Named Entity Recognition
    YU Hongfei
    NI Kunyu
    XU Rongkang
    YU Wenjun
    HUANG Yu
    Wuhan University Journal of Natural Sciences, 2023, 28 (04) : 299 - 308
  • [22] A Framework of Data Augmentation While Active Learning for Chinese Named Entity Recognition
    Li, Qingqing
    Huang, Zhen
    Dou, Yong
    Zhang, Ziwen
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2021, PT II, 2021, 12816 : 88 - 100
  • [23] An improved data augmentation approach and its application in medical named entity recognition
    Chen, Hongyu
    Dan, Li
    Lu, Yonghe
    Chen, Minghong
    Zhang, Jinxia
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)
  • [24] Data augmentation via context similarity: An application to biomedical Named Entity Recognition
    Bartolini, Ilaria
    Moscato, Vincenzo
    Postiglione, Marco
    Sperli, Giancarlo
    Vignali, Andrea
    INFORMATION SYSTEMS, 2023, 119
  • [25] Social Media Named Entity Recognition Based On Graph Attention Network
    Zhang, Wei
    Luo, Jianying
    Yang, Kehua
    2021 INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND INTELLIGENT CONTROLS (ISCSIC 2021), 2021, : 127 - 131
  • [26] A Survey of Deep Learning for Named Entity Recognition in Chinese Social Media
    Liu, Jingxin
    Cheng, Jieren
    Wang, Ziyan
    Lou, Congqiang
    Shen, Chenli
    Sheng, Victor S.
    ARTIFICIAL INTELLIGENCE AND SECURITY, ICAIS 2022, PT I, 2022, 13338 : 573 - 582
  • [27] Constrained Labeled Data Generation for Low-Resource Named Entity Recognition
    Guo, Ruohao
    Roth, Dan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4519 - 4533
  • [28] End-to-End Deep Framework for Disease Named Entity Recognition Using Social Media Data
    Miftahutdinov, Zulfat
    Tutubalina, Elena
    2017 IEEE 30TH NEUMANN COLLOQUIUM (NC), 2017, : 47 - 52
  • [29] Named Entity Recognition in Chinese Rice Breeding Questions Based on Text Data Augmentation
    Niu, Peiyu
    Hou, Chen
    Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2024, 55 (08): : 333 - 343
  • [30] RoPDA: Robust Prompt -Based Data Augmentation for Low -Resource Named Entity Recognition
    Song, Sihan
    Shen, Furao
    Zhao, Jian
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19017 - 19025