Arabic Text Root Extraction via Morphological Analysis and Linguistic Constraints

被引:7
|
作者
Alsaad, Amal [1 ]
Abbod, Maysam [1 ]
机构
[1] Brunel Univ, Dept Elect & Comp Engn, London, England
关键词
Arabic root extraction; morphological analyser; natural language processing; data mining; text mining;
D O I
10.1109/UKSim.2014.43
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Arabic language is vastly inflected, thus the process of effective Arabic text analysis with correct stem and root extraction is challenging. In this paper we present a linguistic root extraction approach that is composed of two main phases. In the first phase we handle removal of affixes including prefixes, suffixes and infixes. Prefixes and suffixes are removed depending on the length of the word, while checking its morphological pattern after each deduction to remove infixes. In the second phase, the root extraction algorithm is developed further to handle weak, hamzated, eliminated-long-vowel and two-letter geminated words as there is a rationally great amount of irregular Arabic words in texts. Before roots are extracted, they are checked against a predefined list of 3800 triliteral and 900 quad literal roots. Series of experiments has been conducted to improve and test the performance of the proposed algorithm. The obtained results revealed that the roots are extracted correctly has improved comparing with Khoja's stemming algorithm.
引用
收藏
页码:125 / 130
页数:6
相关论文
共 50 条
  • [1] Rational kernels for Arabic Root Extraction and Text Classification
    Nehar, Attia
    Ziadi, Djelloul
    Cherroun, Hadda
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2016, 28 (02) : 157 - 169
  • [2] Text mining: A survey of Arabic root extraction algorithms
    Hamza, Manar Ahmed Mohammed
    Ahmed, Tarig Mohamed
    Hilal, Anwer Mustafa Mohamedsalih
    INTERNATIONAL JOURNAL OF ADVANCED AND APPLIED SCIENCES, 2021, 8 (01): : 11 - 19
  • [3] Linguistic integration information in the AABATAS Arabic text analysis system
    Kanoun, S
    Ennaji, A
    Lecourtier, Y
    EIGHTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION: PROCEEDINGS, 2002, : 389 - 394
  • [4] MORPHOLOGICAL COMPRESSION OF ARABIC TEXT
    ALFEDAGHI, SS
    ALSADOUN, HB
    INFORMATION PROCESSING & MANAGEMENT, 1990, 26 (02) : 303 - 316
  • [5] Handling OOV Words In Arabic ASR Via Flexible Morphological Constraints
    Bach, Nguyen
    Noamany, Mohamed
    Lane, Ian
    Schultz, Tanja
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1057 - 1060
  • [6] Morphological Analysis and Decomposition for Arabic Speech-to-Text Systems
    Diehl, F.
    Gales, M. J. F.
    Tomalin, M.
    Woodland, P. C.
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2631 - 2634
  • [7] A New Approach of Morphological Analysis of Arabic Syntagmatic Units Based on a Linguistic Ontology
    El Abdi, Mariem
    Ben Ali, Boutheina Smine
    Ben Yahia, Sadok
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2022, 2022, 13501 : 364 - 377
  • [8] VOLNEY,CONSTANTIN,FRANCOIS ANALYSIS OF THE MORPHOLOGICAL STRUCTURE OF THE ROOT IN ARABIC AND HEBREW
    ROUSSEAU, J
    HISTORIOGRAPHIA LINGUISTICA, 1987, 14 (03) : 341 - 365
  • [9] LAMAD: A Linguistic Attentional Model for Arabic Text Diacritization
    Al-Sabri, Raeed
    Gao, Jianliang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 3757 - 3764
  • [10] Towards a Linguistic Patterns for Arabic Keyphrases Extraction
    Sahmoudi, Issam
    Lachkar, Abdelmonaime
    2016 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY FOR ORGANIZATIONS DEVELOPMENT (IT4OD), 2016,