BiLSTM-CRF Manipuri NER with Character-Level Word Representation

被引:6
|
作者
Jimmy, Laishram [1 ]
Nongmeikappam, Kishorjit [2 ]
Naskar, Sudip Kumar [3 ]
机构
[1] Manipur Tech Univ, Imphal, Manipur, India
[2] Indian Inst Informat Technol Manipur, Imphal, Manipur, India
[3] Jadavpur Univ, Kolkata, W Bengal, India
关键词
Manipuri; Named entity recognition and classification; LSTM; CRF; Embeddings; Deep neural networks; Recurrent neural networks; NAMED ENTITY RECOGNITION; MODEL;
D O I
10.1007/s13369-022-06933-z
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Named Entity Recognition and Classification (NER) serves as a foundation for many natural language processing tasks such as question answering, text summarization, news/document clustering and machine translation. Manipuri's early NER systems are based on machine learning approaches and employ handcrafted morphological features and domain-specific rules. The domain-specific rules for Manipuri NER are hard to extract as the language is highly agglutinative, inflectional and falls in the category of low resource language. In recent years, deep learning, empowered by continuous vector representation and semantic composition through non-linear processing, has been employed in the various NER task yielding state-of-the accuracy. In this paper, we propose a Manipuri NER model using Bidirectional Long Short Term Memory (BiLSTM) deep neural network in unison with an embedding technique. The embedding technique is a BiLSTM character-level word representation in conjunction with word embedding, which acts as a feature for the Bi-LSTM NER model. The proposed model also employs a Conditional Random Field (CRF) classifier to capture the dependency among output NER tags. Various Gradient Descent (GD) optimizers for the neural model were experimented with to establish an efficient GD optimizer for accurate NER. The NER model with RMSprop GD optimizer achieved an F-Score measure of approximately 98.19% at learning rate eta = 0.001 and with decay constant of rho = 0.9. Further, while performing an intrinsic evaluation on the word embedding, it is found that the proposed embedding technique as a feature can capture the semantic and syntactic rule of the language with 88.14% average clustering accuracy for all NE classes.
引用
收藏
页码:1715 / 1734
页数:20
相关论文
共 38 条
  • [1] BiLSTM-CRF Manipuri NER with Character-Level Word Representation
    Laishram Jimmy
    Kishorjit Nongmeikappam
    Sudip Kumar Naskar
    Arabian Journal for Science and Engineering, 2023, 48 : 1715 - 1734
  • [2] A Character-Level BiLSTM-CRF Model With Multi-Representations for Chinese Event Detection
    Mu, Xiaofeng
    Xu, Aiping
    IEEE ACCESS, 2019, 7 : 146524 - 146532
  • [3] Incorporating lexicon and character glyph and morphological features into BiLSTM-CRF for Chinese medical NER
    Yang, Jiang
    Wang, Hongman
    Tang, Yuting
    Yang, Fangchun
    2021 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS AND COMPUTER ENGINEERING (ICCECE), 2021, : 12 - 17
  • [4] A BiLSTM-CRF Based Approach to Word Segmentation in Chinese
    Jin, Yuanyuan
    Tao, Shiyu
    Liu, Qi
    Liu, Xiaodong
    2022 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2022, : 568 - 571
  • [5] Integrating Character-level and Word-level Representation for Affect in Arabic Tweets
    Alharbi, Abdullah, I
    Smith, Phillip
    Lee, Mark
    DATA & KNOWLEDGE ENGINEERING, 2022, 138
  • [6] Integrating Character-level and Word-level Representation for Affect in Arabic Tweets
    Alharbi, Abdullah I.
    Smith, Phillip
    Lee, Mark
    Data and Knowledge Engineering, 2022, 138
  • [7] Chinese Negative Semantic Representation and Annotation Combined with Hybrid Attention Mechanism and BiLSTM-CRF
    Li, Jinrong
    Lyu, Guoying
    Li, Ru
    Chai, Qinghua
    Wang, Chao
    Computer Engineering and Applications, 2023, 59 (09): : 167 - 175
  • [8] Effect of Character and Word Features in Bidirectional LSTM-CRF for NER
    Ronran, Chirawan
    Lee, Seungwoo
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2020), 2020, : 613 - 616
  • [9] Combining Character-Level Representation for Relation Classification
    Liang, Dongyun
    Xu, Weiran
    Zhao, Yinge
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, PT II, 2017, 10614 : 394 - 401
  • [10] A joint method for Chinese word segmentation and part-of-speech tagging based on BiLSTM-CRF
    Yuan L.
    Zhongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Central South University (Science and Technology), 2023, 54 (08): : 3145 - 3153