Arabic Word Segmentation With Long Short-Term Memory Neural Networks and Word Embedding

被引:12
|
作者
Almuhareb, Abdulrahman [1 ]
Alsanie, Waleed [1 ]
Al-Thubaity, Abdulmohsen [1 ]
机构
[1] King Abdulaziz City Sci & Technol, Natl Ctr Artificial Intelligence & Big Data Techn, Riyadh 11442, Saudi Arabia
关键词
Arabic word segmentation; bi-directional long short-term memory; deep learning; neural network; word embedding;
D O I
10.1109/ACCESS.2019.2893460
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose an Arabic word segmentation technique based on a bi-directional long short-term memory deep neural network. This paper addresses the two tasks of word segmentation only and word segmentation for nine cases of the rewrite. Word segmentation with a rewrite concerns inferring letters that are dropped or changed when the main word unit is attached to another unit, and it writes these letters back when the two units are separated as a result of segmentation. We only use binary labels as indicators of segmentation positions. Therefore, label 1 is an indicator of the start of a new word (split) in a sequence of symbols not including whitespace, and label 0 is an indicator for any other case (no-split). This is different from the mainstream feature representation for word segmentation in which multi-valued labeling is used to mark the sequence symbols: beginning, inside, and outside. We used the Arabic Treebank data and its clitics segmentation scheme in our experiments. The trained model without the help of any additional language resources, such as dictionaries, morphological analyzers, or rules, achieved a high Fl value for the Arabic word segmentation only (98.03%) and Arabic word segmentation with the rewrite (more than 99% for frequent rewrite cases). We also compared our model with four state-of-the-art Arabic word segmenters. It performed better than the other segmenters on a modern standard Arabic text, and it was the best among the segmenters that do not use any additional language resources in another test using classical Arabic text.
引用
收藏
页码:12879 / 12887
页数:9
相关论文
共 50 条
  • [1] A Novel Word Spotting Algorithm Using Bidirectional Long Short-Term Memory Neural Networks
    Frinken, Volkmar
    Fischer, Andreas
    Bunke, Horst
    [J]. ARTIFICIAL NEURAL NETWORKS IN PATTERN RECOGNITION, PROCEEDINGS, 2010, 5998 : 185 - 196
  • [2] Twitter Bot Detection Using Bidirectional Long Short-term Memory Neural Networks and Word Embeddings
    Wei, Feng
    Uyen Trang Nguyen
    [J]. 2019 FIRST IEEE INTERNATIONAL CONFERENCE ON TRUST, PRIVACY AND SECURITY IN INTELLIGENT SYSTEMS AND APPLICATIONS (TPS-ISA 2019), 2019, : 101 - 109
  • [3] Biomedical word sense disambiguation with bidirectional long short-term memory and attention-based neural networks
    Zhang, Canlin
    Bis, Daniel
    Liu, Xiuwen
    He, Zhe
    [J]. BMC BIOINFORMATICS, 2019, 20 (Suppl 16)
  • [4] Biomedical word sense disambiguation with bidirectional long short-term memory and attention-based neural networks
    Canlin Zhang
    Daniel Biś
    Xiuwen Liu
    Zhe He
    [J]. BMC Bioinformatics, 20
  • [5] Layered Multistep Bidirectional Long Short-Term Memory Networks for Biomedical Word Sense Disambiguation
    Bis, Daniel
    Zhang, Canlin
    Liu, Xiuwen
    He, Zhe
    [J]. PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 313 - 320
  • [6] Infant word segmentation recruits the cerebral network of phonological short-term memory
    Minagawa, Yasuyo
    Hakuno, Yoko
    Kobayashi, Ai
    Naoi, Nozomi
    Kojima, Shozo
    [J]. BRAIN AND LANGUAGE, 2017, 170 : 39 - 49
  • [7] WORD FREQUENCY AND SHORT-TERM RECOGNITION MEMORY
    UNDERWOOD, BJ
    FREUND, JS
    [J]. AMERICAN JOURNAL OF PSYCHOLOGY, 1970, 83 (03): : 343 - +
  • [8] Word embeddings and recurrent neural networks based on Long-Short Term Memory nodes in supervised biomedical word sense disambiguation
    Yepes, Antonio Jimeno
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2017, 73 : 137 - 147
  • [9] Short-Term Traffic Prediction Using Long Short-Term Memory Neural Networks
    Abbas, Zainab
    Al-Shishtawy, Ahmad
    Girdzijauskas, Sarunas
    Vlassov, Vladimir
    [J]. 2018 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS), 2018, : 57 - 65
  • [10] Chinese Word Segmentation and Recognition Based on Separable Convolution Bidirectional Long Short-Term Memory and Feature Point
    Sun, Fan
    Chen, Zijiao
    Pei, Jingrui
    [J]. JOURNAL OF APPLIED SCIENCE AND ENGINEERING, 2021, 24 (02): : 253 - 259