Weakly labeled data augmentation for social media named entity recognition

被引:4
|
作者
Kim, Juae [1 ]
Kim, Yejin [2 ]
Kang, Sangwoo [3 ]
机构
[1] AIRS Co, Hyundai Motor Grp, Seoul 06620, South Korea
[2] George Washington Univ, Dept Comp Sci, Graph Lab, Washington, DC 20037 USA
[3] Gachon Univ, Sch Comp, Gyeonggi Do 13120, South Korea
基金
新加坡国家研究基金会;
关键词
Named entity recognition; Social-media text mining; Weakly labeled data; Transfer learning;
D O I
10.1016/j.eswa.2022.118217
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Named entity recognition is a task that extracts entities corresponding to predefined categories. Although NER is important in processing user-generated texts such as those obtained from social media, it remains challenging because such texts tend to contain numerous unseen words or abbreviations. To address this issue, we propose two methods for weakly labeled data generation that can extract named entities from social media texts more effectively: alias augmentation and typo augmentation. Using these methods, weakly labeled data are generated through the automatic annotation of unlabeled Wikipedia texts and Tweets and then trained through transfer learning. Our experimental results suggest that the proposed approach improves NER performance, with our best F1-score of 51.43% representing the highest score ever reported.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Improving Named Entity Recognition for Social Media with Data Augmentation
    Liu, Wenzhong
    Cui, Xiaohui
    APPLIED SCIENCES-BASEL, 2023, 13 (09):
  • [2] Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data
    Jiang, Haoming
    Zhang, Danqing
    Cao, Tianyu
    Yin, Bing
    Zhao, Tuo
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 1775 - 1789
  • [3] Named Entity Recognition for Social Media Texts with Semantic Augmentation
    Nie, Yuyang
    Tian, Yuanhe
    Wan, Xiang
    Yan Song
    Bo Dai
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 1383 - 1391
  • [4] Novel data augmentation for named entity recognition
    Hemateja A.V.N.M.
    Kondakath G.
    Das S.
    Kothandaraman M.
    Shoba S.
    Pandey A.
    Babu R.
    Jain A.
    International Journal of Speech Technology, 2023, 26 (4) : 869 - 878
  • [5] Two-perspective Biomedical Named Entity Recognition with Weakly Labeled Data Correction
    Zhou, Huiwei
    Liu, Zhe
    Lang, Chengkun
    Xu, Yibin
    Du, Lei
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 941 - 944
  • [6] Correction to: Novel data augmentation for named entity recognition
    Aluru V. N. M. Hemateja
    Gopikrishnan Kondakath
    Susruta Das
    Mohanaprasad Kothandaraman
    S. Shoba
    Abhishek Pandey
    Rajin Babu
    Abhinav Jain
    International Journal of Speech Technology, 2023, 26 (4) : 879 - 879
  • [7] Data Augmentation for Chinese Clinical Named Entity Recognition
    Wang P.-H.
    Li M.-Z.
    Li S.
    Li, Si (lisi@bupt.edu.cn), 1600, Beijing University of Posts and Telecommunications (43): : 84 - 90
  • [8] Data Augmentation Techniques on Arabic Data for Named Entity Recognition
    Sabty, Caroline
    Omar, Islam
    Wasfalla, Fady
    Islam, Mohamed
    Abdennadher, Slim
    AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 292 - 299
  • [9] Semi-Supervised Learning for Named Entity Recognition Using Weakly Labeled Training Data
    Zafarian, Atefeh
    Rokni, Ali
    Khadivi, Shahram
    Ghiasifard, Sonia
    2015 INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING (AISP), 2015, : 129 - 135
  • [10] Grounded Multimodal Named Entity Recognition on Social Media
    Yu, Jianfei
    Li, Ziyan
    Wang, Jieming
    Xia, Rui
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 9141 - 9154