Arabic Word Segmentation With Long Short-Term Memory Neural Networks and Word Embedding

被引:12
|
作者
Almuhareb, Abdulrahman [1 ]
Alsanie, Waleed [1 ]
Al-Thubaity, Abdulmohsen [1 ]
机构
[1] King Abdulaziz City Sci & Technol, Natl Ctr Artificial Intelligence & Big Data Techn, Riyadh 11442, Saudi Arabia
关键词
Arabic word segmentation; bi-directional long short-term memory; deep learning; neural network; word embedding;
D O I
10.1109/ACCESS.2019.2893460
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose an Arabic word segmentation technique based on a bi-directional long short-term memory deep neural network. This paper addresses the two tasks of word segmentation only and word segmentation for nine cases of the rewrite. Word segmentation with a rewrite concerns inferring letters that are dropped or changed when the main word unit is attached to another unit, and it writes these letters back when the two units are separated as a result of segmentation. We only use binary labels as indicators of segmentation positions. Therefore, label 1 is an indicator of the start of a new word (split) in a sequence of symbols not including whitespace, and label 0 is an indicator for any other case (no-split). This is different from the mainstream feature representation for word segmentation in which multi-valued labeling is used to mark the sequence symbols: beginning, inside, and outside. We used the Arabic Treebank data and its clitics segmentation scheme in our experiments. The trained model without the help of any additional language resources, such as dictionaries, morphological analyzers, or rules, achieved a high Fl value for the Arabic word segmentation only (98.03%) and Arabic word segmentation with the rewrite (more than 99% for frequent rewrite cases). We also compared our model with four state-of-the-art Arabic word segmenters. It performed better than the other segmenters on a modern standard Arabic text, and it was the best among the segmenters that do not use any additional language resources in another test using classical Arabic text.
引用
收藏
页码:12879 / 12887
页数:9
相关论文
共 50 条
  • [41] Major-Minor Long Short-Term Memory for Word-Level Language Model
    Shuang, Kai
    Li, Rui
    Gu, Mengyu
    Loo, Jonathan
    Su, Sen
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (10) : 3932 - 3946
  • [42] DSP Based Acceleration for Long Short-Term Memory Model Based Word Prediction Application
    Zhu, Keqian
    Jiang, Jingfei
    [J]. 2017 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTATION TECHNOLOGY AND AUTOMATION (ICICTA 2017), 2017, : 93 - 99
  • [43] Combining fuzzy clustering and improved long short-term memory neural networks for short-term load forecasting
    Liu, Fu
    Dong, Tian
    Liu, Qiaoliang
    Liu, Yun
    Li, Shoutao
    [J]. ELECTRIC POWER SYSTEMS RESEARCH, 2024, 226
  • [44] Malware classification using word embeddings algorithms and long-short term memory networks
    Andrade, Eduardo de O.
    Viterbo, Jose
    Guerin, Joris
    Bernardini, Flavia
    [J]. COMPUTATIONAL INTELLIGENCE, 2022, 38 (05) : 1802 - 1830
  • [45] On the Initialization of Long Short-Term Memory Networks
    Ghazi, Mostafa Mehdipour
    Nielsen, Mads
    Pai, Akshay
    Modat, Marc
    Cardoso, M. Jorge
    Ourselin, Sebastien
    Sorensen, Lauge
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2019), PT I, 2019, 11953 : 275 - 286
  • [46] Evolving Long Short-Term Memory Networks
    Neto, Vicente Coelho Lobo
    Passos, Leandro Aparecido
    Papa, Joao Paulo
    [J]. COMPUTATIONAL SCIENCE - ICCS 2020, PT II, 2020, 12138 : 337 - 350
  • [47] Better Phonological Short-Term Memory Is Linked to Improved Cortical Memory Representations for Word Forms and Better Word Learning
    Ylinen, Sari
    Nora, Anni
    Service, Elisabet
    [J]. FRONTIERS IN HUMAN NEUROSCIENCE, 2020, 14
  • [48] Independence of input and output phonology in word processing and short-term memory
    Martin, RC
    Lesch, MF
    Bartha, MC
    [J]. JOURNAL OF MEMORY AND LANGUAGE, 1999, 41 (01) : 3 - 29
  • [49] WORD FREQUENCY AND UNIT SEQUENCE INTERFERENCE HYPOTHESIS IN SHORT-TERM MEMORY
    BADDELEY, AD
    SCOTT, D
    [J]. JOURNAL OF VERBAL LEARNING AND VERBAL BEHAVIOR, 1971, 10 (01): : 35 - 40
  • [50] Multisyllabic Word Repetition as a Measure of Short-Term Phonological Memory in Children
    Kiese-Himmel, C.
    Reeh, M.
    [J]. SPRACHE-STIMME-GEHOR, 2010, 34 (02): : E10 - E15