Thai Named Entity Corpus Annotation Scheme and Self Verification by BiLSTM-CNN-CRF

被引:0
|
作者
Sornlertlamvanich, Virach [1 ,3 ]
Suriyachay, Kitiya [2 ]
Charoenporn, Thatsanee [1 ]
机构
[1] Musashino Univ, Fac Data Sci, Tokyo, Japan
[2] Thammasat Univ, Sch ICT, Sirindhorn Int Inst Technol, Pathum Thani, Thailand
[3] Thammasat Univ, Fac Engn, Pathum Thani, Thailand
关键词
Corpus annotation; Named entity recognition; Thai named entity; Thai corpus;
D O I
10.1007/978-3-031-05328-3_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Corpus is one of the essential parts of language research, especially for the low resource language. To ensure the researching result to be most effective, the corpus that has been used also requires effectiveness and accuracy. The Thai language has some special characteristics that cause difficulty in building the corpus and affect the error of those corpora. Therefore, this paper proposes an effective and efficient approach to clean up the existing Named Entity corpus before using it in any language research. The THAI-NEST corpus is adopted to verify the consistency and integrity of the data and re-design with our proposed model. The revised corpus is verified by the BiLSTM-CNN-CRF model that combined the features among word, POS, and Thai character clusters (TCCs). Experimental results show the effectiveness of the verification, which increased the accuracy by up to 12%, and the model can effectively detect and handle errors of word segmentation and NE tag consistency.
引用
收藏
页码:143 / 160
页数:18
相关论文
共 50 条
  • [1] Thai Named Entity Recognition Using BiLSTM-CNN-CRF Enhanced by TCC
    Sornlertlamvanich, Virach
    Yuenyong, Sumeth
    [J]. IEEE ACCESS, 2022, 10 : 53043 - 53052
  • [2] Chinese Named Entity Recognition Based on CNN-BiLSTM-CRF
    Jia, Yaozong
    Xu, Xiaobin
    [J]. PROCEEDINGS OF 2018 IEEE 9TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2018, : 831 - 834
  • [3] HAZOP Text Named Entity Recognition using CNN-BilSTM-CRF Model
    Gao, Dong
    Peng, Lanfei
    Bai, Yujie
    [J]. 2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 6159 - 6164
  • [4] Named Entity Recognition of BERT-BiLSTM-CRF Combined with Self-attention
    Xu, Lei
    Li, Shuang
    Wang, Yuchen
    Xu, Lizhen
    [J]. WEB INFORMATION SYSTEMS AND APPLICATIONS (WISA 2021), 2021, 12999 : 556 - 564
  • [5] Named Entity Recognition by Using XLNet-BiLSTM-CRF
    Rongen Yan
    Xue Jiang
    Depeng Dang
    [J]. Neural Processing Letters, 2021, 53 : 3339 - 3356
  • [6] BiLSTM-CRF for Persian Named-Entity Recognition
    Poostchi, Hanieh
    Borzeshi, Ehsan Zare
    Piccardi, Massimo
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 4427 - 4431
  • [7] Named Entity Recognition by Using XLNet-BiLSTM-CRF
    Yan, Rongen
    Jiang, Xue
    Dang, Depeng
    [J]. NEURAL PROCESSING LETTERS, 2021, 53 (05) : 3339 - 3356
  • [8] Thai Nested Named Entity Recognition Corpus
    Buaphet, Weerayut
    Udomcharoenchaikit, Can
    Limkonchotiwat, Peerat
    Rutherford, Attapol T.
    Nutanong, Sarana
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1473 - 1486
  • [9] Named entity recognition for Chinese judgment documents based on BiLSTM and CRF
    Huang, Wenming
    Hu, Dengrui
    Deng, Zhenrong
    Nie, Jianyun
    [J]. EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2020, 2020 (01)
  • [10] 基于BiLSTM-CNN-CRF模型的维吾尔文命名实体识别
    买买提阿依甫
    吾守尔·斯拉木
    帕丽旦·木合塔尔
    杨文忠
    [J]. 计算机工程, 2018, 44 (08) : 230 - 236