LTP: A New Active Learning Strategy for CRF-Based Named Entity Recognition

被引:14
|
作者
Liu, Mingyi [1 ]
Tu, Zhiying [1 ]
Zhang, Tong [1 ]
Su, Tonghua [1 ]
Xu, Xiaofei [1 ]
Wang, Zhongjie [1 ]
机构
[1] Harbin Inst Technol, Fac Comp, Harbin, Peoples R China
基金
美国国家科学基金会;
关键词
Active learning; Learning strategies; Named entity recognition; CRF;
D O I
10.1007/s11063-021-10737-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, deep learning has achieved great success in many natural language processing tasks, including named entity recognition. The shortcoming is that a large quantity of manually annotated data is usually required. Previous studies have demonstrated that active learning can considerably reduce the cost of data annotation, but there is still plenty of room for improvement. In real applications, we found that existing uncertainty-based active learning strategies have two shortcomings. First, these strategies prefer to choose long sequences explicitly or implicitly, which increases the annotation burden of annotators. Second, some strategies need to revise and modify the model to generate additional information for sample selection, which increases the workload of the developer and increases the training/prediction time of the model. In this paper, we first examine traditional active learning strategies in specific cases of Word2Vec-BiLSTM-CRF and Bert-CRF that have been widely used in named entity recognition on several typical datasets. Then, we propose an uncertainty-based active learning strategy called the lowest token probability (LTP), which combines the input and output of conditional random field (CRF) to select informative instances. LTP is a simple and powerful strategy that does not favor long sequences and does not need to revise the model. We test LTP on multiple real-world datasets, the experiment results show that compared with existing state-of-the-art selection strategies, LTP can reduce about 20% annotation tokens while maintaining competitive performance on both sentence-level accuracy and entity-level F1-score. Additionally, LTP significantly outperformed all other strategies in selecting valid samples, which dramatically reduced the invalid annotation times of the labelers.
引用
收藏
页码:2433 / 2454
页数:22
相关论文
共 50 条
  • [31] Active learning approach using a modified least confidence sampling strategy for named entity recognition
    Ankit Agrawal
    Sarsij Tripathi
    Manu Vardhan
    Progress in Artificial Intelligence, 2021, 10 : 113 - 128
  • [32] Active learning approach using a modified least confidence sampling strategy for named entity recognition
    Agrawal, Ankit
    Tripathi, Sarsij
    Vardhan, Manu
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2021, 10 (02) : 113 - 128
  • [33] Re-weighting Tokens: A Simple and Effective Active Learning Strategy for Named Entity Recognition
    Luo, Haocheng
    Tan, Wei
    Ngoc Dang Nguyen
    Du, Lan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 12725 - 12734
  • [34] Ensemble based Active Annotation for Named Entity Recognition
    Ekbal, Asif
    Saha, Sriparna
    Singh, Dhirendra
    2012 THIRD INTERNATIONAL CONFERENCE ON EMERGING APPLICATIONS OF INFORMATION TECHNOLOGY (EAIT), 2012, : 331 - 334
  • [35] Named entity recognition using point prediction and active learning
    Kobayashi, Koga
    Wakabayashi, Kei
    IIWAS2019: THE 21ST INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES, 2019, : 287 - 293
  • [36] Research of Clinical Named Entity Recognition Based on Bi-LSTM-CRF
    Qin Y.
    Zeng Y.
    Journal of Shanghai Jiaotong University (Science), 2018, 23 (3) : 392 - 397
  • [37] UD_BBC: Named entity recognition in social network combined BERT-BiLSTM-CRF with active learning
    Li, Wei
    Du, Yajun
    Li, Xianyong
    Chen, Xiaoliang
    Xie, Chunzhi
    Li, Hui
    Li, Xiaolei
    Engineering Applications of Artificial Intelligence, 2022,
  • [38] Research of Clinical Named Entity Recognition Based on Bi-LSTM-CRF
    秦颖
    曾颖菲
    Journal of Shanghai Jiaotong University(Science), 2018, 23 (03) : 392 - 397
  • [39] Named entity recognition in thangka field based on bert-bilstm-crf-a
    Guo, Xiaoran
    Cheng, Sujie
    Wang, Weilan
    UPB Scientific Bulletin, Series C: Electrical Engineering and Computer Science, 2021, 83 (01): : 161 - 174
  • [40] Chinese agricultural diseases named entity recognition based on BERT-CRF
    Zhang, Suoxiang
    Zhao, Ming
    2020 5TH INTERNATIONAL CONFERENCE ON MECHANICAL, CONTROL AND COMPUTER ENGINEERING (ICMCCE 2020), 2020, : 1144 - 1147