An Efficient Text Representation for Searching and Retrieving Classical Diacritical Arabic Text

被引:3
|
作者
Hakak, Saqib [1 ]
Kamsin, Amirrudin [1 ]
Shivakumara, Palaiahnakote [1 ]
Tayan, Omar [2 ]
Idris, Mohd Yamani Idna [1 ]
Gilkar, Gulshan amin [3 ]
机构
[1] Univ Malaya, Fac Comp Sci & Informat Technol, Kuala Lumpur 50603, Malaysia
[2] Taibah Univ, Coll Comp Sci & Engn, Dept Comp Engn, Medina, Saudi Arabia
[3] Shaqra Univ, Coll Comp Sci & Informat Technol, Riyadh, Saudi Arabia
来源
关键词
Digital Quran; Pattern matching; verification of Quran; Information retrieval; Quran authentication; Arabic/Farsi texts; Urdu texts;
D O I
10.1016/j.procs.2018.10.470
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the rapid growth of the Internet and advanced technologies, data storage and extraction of Arabic diacritical data in real time from an Arabic corpus have become a vital issue in the field of information retrieval. In this paper, we propose a new idea for representing Arabic diacritic text in the corpus such that search engines can enhance the search time of retrieving the desired text with high precision. To achieve our goal, we segment the Arabic diacritical sentences/verses into individual characters along with diacritics which are necessary for interpreting the meanings. Then, we propose a new data structure for representing data using segmented alphabets. To verify the corpus representation, the proposed approach uses the Boyer-Moore algorithm for searching given verses of Arabic diacritical data. The proposed representation of data structure reduces the search time from O(m*n) to O(1+m) in the worst case, where m denotes the diacritical verse to be searched, and n denotes the total number of diacritical verses. Experimental results on popular corpus show that the proposed method outperforms the existing search methods in terms of time complexity. (C) 2018 The Authors. Published by Elsevier B.V.
引用
收藏
页码:150 / 157
页数:8
相关论文
共 50 条
  • [1] Arabic Text Semantic Graph Representation
    Al Etaiwi, Wael Mahmoud
    Awajan, Arafat
    [J]. 2019 2ND INTERNATIONAL CONFERENCE ON NEW TRENDS IN COMPUTING SCIENCES (ICTCS), 2019, : 265 - 270
  • [2] Retrieving cases for treatment advice in nursing using text representation and structured text retrieval
    Yearwood, J
    Wilkinson, R
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 1997, 9 (01) : 79 - 99
  • [3] Using N-grams for arabic text searching
    Mustafa, SH
    Al-Radaideh, QA
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2004, 55 (11): : 1002 - 1007
  • [4] EFFICIENT TEXT SEARCHING OF REGULAR EXPRESSIONS
    BAEZAYATES, RA
    GONNET, GH
    [J]. LECTURE NOTES IN COMPUTER SCIENCE, 1989, 382 : 1 - 2
  • [5] EFFICIENT TEXT SEARCHING OF REGULAR EXPRESSIONS
    BAEZAYATES, RA
    GONNET, GH
    [J]. LECTURE NOTES IN COMPUTER SCIENCE, 1989, 372 : 46 - 62
  • [6] CentripetalText: An Efficient Text Instance Representation for Scene Text Detection
    Sheng, Tao
    Chen, Jie
    Lian, Zhouhui
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [7] An Efficient Hybrid Model for Arabic Text Recognition
    Lamtougui, Hicham
    El Moubtahij, Hicham
    Fouadi, Hassan
    Satori, Khalid
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (02): : 2871 - 2888
  • [8] A Model for Generating Arabic Text from Semantic Representation
    Ismail, Sally S.
    Aref, Mostafa
    Moawad, Ibrahim F.
    [J]. 2015 11TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO), 2015, : 117 - 122
  • [9] Graph-based Arabic text semantic representation
    Etaiwi, Wael
    Awajan, Arafat
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (03)
  • [10] Semantic Representation Extraction from Unstructured Arabic Text
    Zakria, Gehad
    Farouk, Mamdouh
    Fathy, Khaled
    Makar, Malak N.
    [J]. PROCEEDINGS OF 2019 8TH INTERNATIONAL CONFERENCE ON SOFTWARE AND INFORMATION ENGINEERING (ICSIE 2019), 2019, : 222 - 226