QSST: A Quranic Semantic Search Tool based on word embedding

被引:9
|
作者
Mohamed, Ensaf Hussein [1 ]
Shokry, Eyad Mohamed [1 ]
机构
[1] Helwan Univ, Fac Comp & Artificial Intelligence, Comp Sci Dept, Cairo, Egypt
关键词
Information Retrieval; Word Embedding; Concept-based Search; Ontology; Semantic Search; Arabic Natural Language Processing; Holy Quran;
D O I
10.1016/j.jksuci.2020.01.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Retrieving information from the Quran is an important field for Quran scholars and Arabic researchers. There are two types of Quran searching techniques: semantic or concept-based and keyword-based. Concept-based search is a challenging task, especially in a complex corpus such as Quran. This paper presents a concept-based searching tool (QSST) for the Holy Quran. It consists of four phases. In the first phase, the Quran dataset is built by manually annotating Quran verses based on the ontology of Mushaf Al-Tajweed. The second phase is word Embedding, this phase generates features' vectors for words by training a Continuous Bag of Words (CBOW) architecture on large Quranic and Classic Arabic corpus. The third phase includes calculating the features' vectors of both input query and Quranic topics. Finally, retrieving the most relevant verses by computing the cosine similarity between both topic and query vectors. The performance of the proposed QSST is measured by comparing results against Mushaf Al-Tajweed. Then, precision, recall, and F-score are computed and their percentages were 76.91%, 72.23% 69.28% respectively. In addition, the results are evaluated by three Islamic experts and the average precision was 91.95%. Finally, QSST results are compared with the recent existing tools; QSST outperformed them. (C) 2020 The Authors. Production and hosting by Elsevier B.V. on behalf of King Saud University.
引用
收藏
页码:934 / 945
页数:12
相关论文
共 50 条
  • [41] The estimation of stability of semantic space generated by word embedding algorithms
    Sanzhar, Amirzhan
    Pak, Alexander
    Bulatovna, Jaxylykova Assel
    2018 INTERNATIONAL JOINT SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE PROCESSING (ISAI-NLP 2018), 2018, : 92 - 96
  • [42] A survey on word embedding techniques and semantic similarity for paraphrase identification
    Kubal, Divesh R.
    Nimkar, Anant V.
    International Journal of Computational Systems Engineering, 2019, 5 (01) : 36 - 52
  • [43] Click-through-based Deep Visual-Semantic Embedding for Image Search
    Liu, Yuan
    Shi, Zhongchao
    Li, Xue
    Wang, Gang
    MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, 2015, : 955 - 958
  • [44] Word Embedding-Based Approaches for Measuring Semantic Similarity of Arabic-English Sentences
    Nagoudi, El Moatez Billah
    Ferrero, Jeremy
    Schwab, Didier
    Cherroun, Hadda
    ARABIC LANGUAGE PROCESSING: FROM THEORY TO PRACTICE, 2018, 782 : 19 - 33
  • [45] An Algorithm of Semantic Similarity Between Words Based on Word Single-meaning Embedding Model
    Li X.-T.
    You S.-J.
    Chen W.
    Zidonghua Xuebao/Acta Automatica Sinica, 2020, 46 (08): : 1654 - 1669
  • [46] Dematerialization, Archiving and Recovery of Documents: A Proposed Tool Based on a Semantic Classifier and a Semantic Search Engine
    Errico, Fabrizio
    Corallo, Angelo
    Barriera, Rita
    Prato, Marco
    2020 9TH INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY AND MANAGEMENT (ICITM 2020), 2020, : 297 - 301
  • [47] WORD EMBEDDING BASED ON LARGE-SCALE WEB CORPORA AS A POWERFUL LEXICOGRAPHIC TOOL
    Garabik, Radovan
    RASPRAVE, 2020, 46 (02): : 603 - 618
  • [48] An approach for word categorization based on semantic similarity measure obtained from search engines
    Amasyah, M. Fatih
    2006 IEEE 14th Signal Processing and Communications Applications, Vols 1 and 2, 2006, : 53 - 56
  • [49] Explorations into the Use of Word Embedding in Math Search and Math Semantics
    Youssef, Abdou
    Miller, Bruce R.
    INTELLIGENT COMPUTER MATHEMATICS, CICM 2019, 2019, 11617 : 291 - 305
  • [50] Bridging Semantic Gaps between Natural Languages and APIs with Word Embedding
    Li, Xiaochen
    Jiang, He
    Kamei, Yasutaka
    Chen, Xin
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2020, 46 (10) : 1081 - 1097