A New Chinese Named Entity Recognition Method for Pig Disease Domain Based on Lexicon-Enhanced BERT and Contrastive Learning

被引:0
|
作者
Peng, Cheng [1 ,2 ,3 ]
Wang, Xiajun [1 ,4 ]
Li, Qifeng [1 ,2 ,3 ]
Yu, Qinyang [1 ,2 ,3 ]
Jiang, Ruixiang [1 ,2 ,3 ]
Ma, Weihong [1 ,2 ,3 ]
Wu, Wenbiao [1 ,2 ,3 ]
Meng, Rui [1 ,2 ,3 ]
Li, Haiyan [1 ,2 ,3 ]
Huai, Heju [1 ,2 ,3 ]
Wang, Shuyan [1 ,2 ,3 ]
He, Longjuan [5 ]
机构
[1] Beijing Acad Agr & Forestry Sci, Informat Technol Res Ctr, Beijing 100097, Peoples R China
[2] Natl Innovat Ctr Digital Technol Anim Husb, Beijing 100097, Peoples R China
[3] Natl Engn Res Ctr Informat Technol Agr, Beijing 100097, Peoples R China
[4] Hubei Univ, Fac Resources & Environm Sci, Wuhan 430061, Peoples R China
[5] Chinese Acad Agr Sci, Inst Agr Econ & Dev, Beijing 100081, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 16期
关键词
pig disease; Chinese named entity recognition; lexicon-enhanced BERT; contrastive learning; small sample;
D O I
10.3390/app14166944
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Featured Application Our work provides reliable technical support for the information extraction of pig diseases in Chinese . It can be applied to other domain - specific fields, thereby facilitating seamless adaptation for named entity identification across diverse contexts .Abstract Named Entity Recognition (NER) is a fundamental and pivotal stage in the development of various knowledge-based support systems, including knowledge retrieval and question-answering systems. In the domain of pig diseases, Chinese NER models encounter several challenges, such as the scarcity of annotated data, domain-specific vocabulary, diverse entity categories, and ambiguous entity boundaries. To address these challenges, we propose PDCNER, a Pig Disease Chinese Named Entity Recognition method leveraging lexicon-enhanced BERT and contrastive learning. Firstly, we construct a domain-specific lexicon and pre-train word embeddings in the pig disease domain. Secondly, we integrate lexicon information of pig diseases into the lower layers of BERT using a Lexicon Adapter layer, which employs char-word pair sequences. Thirdly, to enhance feature representation, we propose a lexicon-enhanced contrastive loss layer on top of BERT. Finally, a Conditional Random Field (CRF) layer is employed as the model's decoder. Experimental results show that our proposed model demonstrates superior performance over several mainstream models, achieving a precision of 87.76%, a recall of 86.97%, and an F1-score of 87.36%. The proposed model outperforms BERT-BiLSTM-CRF and LEBERT by 14.05% and 6.8%, respectively, with only 10% of the samples available, showcasing its robustness in data scarcity scenarios. Furthermore, the model exhibits generalizability across publicly available datasets. Our work provides reliable technical support for the information extraction of pig diseases in Chinese and can be easily extended to other domains, thereby facilitating seamless adaptation for named entity identification across diverse contexts.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Research on Named Entity Recognition in Ancient Chinese Based on Incremental Pre-training and Domain Lexicon
    Kang, Wenjun
    Zuo, Jiali
    Dai, Qili
    Hu, Yiyu
    Wang, Mingwen
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT I, NLPCC 2024, 2025, 15359 : 483 - 503
  • [22] Named Entity Recognition Method for Educational Emergency Field Based on BERT
    Wei, Kangwei
    Wen, Bin
    PROCEEDINGS OF 2021 IEEE 12TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2021, : 145 - 149
  • [23] Named entity recognition method in health preserving field based on BERT
    Zhang, Qiang
    Sun, Yong
    Zhang, Linlin
    Jiao, Yanfei
    Tian, Yue
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE OF INFORMATION AND COMMUNICATION TECHNOLOGY, 2021, 183 : 212 - 220
  • [24] A Named Entity Recognition Method Enhanced with Lexicon Information and Text Local Feature
    Ma, Yuekun
    Liu, He
    Zhang, Dezheng
    Gao, Chang
    Liu, Yujue
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2023, 30 (03): : 899 - 906
  • [25] Chinese agricultural diseases named entity recognition based on BERT-CRF
    Zhang, Suoxiang
    Zhao, Ming
    2020 5TH INTERNATIONAL CONFERENCE ON MECHANICAL, CONTROL AND COMPUTER ENGINEERING (ICMCCE 2020), 2020, : 1144 - 1147
  • [26] Chinese Named Entity Recognition Based on BERT and Lightweight Feature Extraction Model
    Yang, Ruisen
    Gan, Yong
    Zhang, Chenfang
    INFORMATION, 2022, 13 (11)
  • [27] A Flat-Span Contrastive Learning Method for Nested Named Entity Recognition
    Liu, Yaodi
    Zhang, Kun
    Tong, Rong
    Cai, Chenxi
    Chen, Dianying
    Wu, Xiaohe
    2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 37 - 42
  • [28] A chinese named entity recognition method for small-scale dataset based on lexicon and unlabeled data
    Shaobin Huang
    Yongpeng Sha
    Rongsheng Li
    Multimedia Tools and Applications, 2023, 82 : 2185 - 2206
  • [29] A chinese named entity recognition method for small-scale dataset based on lexicon and unlabeled data
    Huang, Shaobin
    Sha, Yongpeng
    Li, Rongsheng
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (02) : 2185 - 2206
  • [30] Chinese Named Entity Recognition Method for Domain-Specific Text
    Liu, He
    Ma, Yuekun
    Gao, Chang
    Jia, Qi
    Zhang, Dezheng
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2023, 30 (06): : 1799 - 1808