Arabic named entity recognition in social media based on BiLSTM-CRF using an attention mechanism

被引:2
|
作者
Benali, B. Ait [1 ]
Mihi, S. [1 ]
Mlouk, A. Ait [2 ]
El Bazi, I [3 ]
Laachfoubi, N. [1 ]
机构
[1] Hassan First Univ Settat, Fac Sci & Tech, IR2M Lab, Settat, Morocco
[2] Uppsala Univ, Dept Informat Technol, Div Sci Comp, Uppsala, Sweden
[3] Sultan Moulay Slimane Univ, Natl Sch Business & Management, Beni Mellal, Morocco
关键词
Arabic named entity recognition (ANER); natural language processing (NLP); multi-head self-attention; BiLSTM; CRF; dialect arabic; social media;
D O I
10.3233/JIFS-211944
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Named Entity Recognition (NER) is a vitally important task of Natural Language Processing (NLP), which aims at finding named entities in natural language text and classifying them into predefined categories such as persons (PER), places (LOC), organizations (ORG), and so on. In the Arabic context, the current NER approaches based on deep learning are mainly based on word embedding or character-level embedding as input. However, using a single granularity representation has problems with out-of-vocabulary (OOV), word embedding errors, and relatively simple semantic content. This paper presents a multi-headed self-attention mechanism implemented in the BiLSTM-CRF neural network structure to recognize Arabic named entities on social media using two embeddings. Unlike other state-of-the-art approaches, this approach combines character and word embedding at the embedding layer, and the attention mechanism calculates the similarity over the entire sequence of characters and captures local context information. The proposed approach better recognized NEs in Dialect Arabic, reaching an F1 value of 74.15% on Darwish's dataset (a publicly available Arabic NER benchmark for social media). According to our knowledge, our findings outperform the current state-of-the-art models for Arabic Named Entity Recognition on social media.
引用
收藏
页码:5427 / 5436
页数:10
相关论文
共 50 条
  • [41] Thai Named Entity Recognition Using BiLSTM-CNN-CRF Enhanced by TCC
    Sornlertlamvanich, Virach
    Yuenyong, Sumeth
    IEEE ACCESS, 2022, 10 : 53043 - 53052
  • [42] Named Entity Recognition Method for Power Equipment Based on BERT-BiLSTM-CRF
    Hu, Jiangyi
    Yang, Wenqing
    Yang, Huafei
    Wei, Shanming
    Sun, Zhen
    2022 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2022, : 694 - 699
  • [43] Named Entity Recognition with Gating Mechanism and Parallel BiLSTM
    Yi, Yenan
    Bian, Yijie
    JOURNAL OF WEB ENGINEERING, 2021, 20 (04): : 1157 - 1175
  • [44] Tibetan location Name Recognition Based on BiLSTM-CRF Model
    Ma, Wei
    Yu, Hongzhi
    Zhao, Kun
    Zhao, Deshun
    Yang, Jun
    Ma, Jing
    2019 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY (CYBERC), 2019, : 412 - 416
  • [45] Multichannel LSTM-CRF for Named Entity Recognition in Chinese Social Media
    Dong, Chuanhai
    Wu, Huijia
    Zhang, Jiajun
    Zong, Chengqing
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2017, 2017, 10565 : 197 - 208
  • [46] Research on Named Entity Recognition Method of Metro On-Board Equipment Based on Multiheaded Self-Attention Mechanism and CNN-BiLSTM-CRF
    Lin, Junting
    Liu, Endong
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [47] Named Entity Recognition for Terahertz Domain Knowledge Graph based on Albert-BiLSTM-CRF
    Zhang, Xiao
    Li, Chuanzhen
    Du, Huaichang
    PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 2602 - 2606
  • [48] Named Entity Recognition in Qu Tan temple murals based on BERT-BiLSTM-CRF
    Yao, Feiyang
    Liu, Xiaojing
    2022 IEEE 6TH ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2022, : 1839 - 1843
  • [49] Named Entity Recognition of Ancient Poems Based on Albert-BiLSTM-MHA-CRF Model
    Zhou, Faguo
    Wang, Chao
    Wang, Jipeng
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [50] Named Entity Recognition of Power Substation Knowledge Based on Transformer-BiLSTM-CRF Network
    Yang, Q. Y.
    Jiang, J.
    Feng, X. Y.
    He, J. M.
    Chen, B. R.
    Zhang, Z. Y.
    2020 INTERNATIONAL CONFERENCE ON SMART GRIDS AND ENERGY SYSTEMS (SGES 2020), 2020, : 952 - 956