Arabic real time entity resolution using inverted indexing

被引:2
|
作者
Alian, Marwah [1 ,3 ]
Al-Naymat, Ghazi [2 ,3 ]
Ramadan, Banda [4 ]
机构
[1] Hashemite Univ, Zarqa, Jordan
[2] Ajman Univ, Ajman, U Arab Emirates
[3] Princess Sumaya Univ Technol, Amman, Jordan
[4] Prince Sultan Univ, Riyadh, Saudi Arabia
关键词
Arabic Entity Resolution; Similarity Aware Inverted Indexes; Similarity functions; Record pair comparison;
D O I
10.1007/s10579-020-09504-6
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Arabic datasets that have two or more records for the same world entity (i.e. person, object, etc.) make institutions suffer from low quality and degraded performance due to duplication in their Arabic datasets without having any mechanism for detecting these duplicates. The operation that distinguishes records for the same real-world entity is called Entity Resolution (ER). It is considered as a tool for linking records across databases as well as for matching query records with existing databases in real-time. Indexing is a major step in the ER process that aims at reducing the search space. Several indexing techniques are available for use with the ER process in general for English Databases. However, such techniques are not validated if they work well with other languages, such as Arabic. The Dynamic Similarity Aware Inverted Index (DySimII) is one of the indexing techniques that are utilized with dynamic databases to match query records in real time and is demonstrated to work well with English language. In this paper, we propose a framework-Arabic Real Time Entity Resolution (ARTER)-that uses DySimII with Arabic databases to perform real time ER. We also examine using different string similarity functions required for comparing records in the matching process for the aim of evaluating which similarity function is more suitable for comparing Arabic strings. A real-world Arabic database is used to conduct our experimental evaluation where two stemmers and three similarity functions are used to see the effect on DySimII with Arabic dataset. The results represent that matching accuracy is improved using Asem stemmer when the number of corrupted attributes is increased, also testing the three similarity functions show that using winkler similarity function provides better matching accuracy while N-gram provides better results when used with Asem stemmer.
引用
收藏
页码:921 / 941
页数:21
相关论文
共 50 条
  • [31] Enhanced Search for Arabic Language Using Latent Semantic Indexing (LSI)
    Al-Anzi, Fawaz S.
    AbuZeina, Dia
    2018 INTERNATIONAL CONFERENCE ON INTELLIGENT AND INNOVATIVE COMPUTING APPLICATIONS (ICONIC), 2018, : 456 - 459
  • [32] Inverted Indexing In Big Data Using Hadoop Multiple Node Cluster
    Velusamy, Kaushik
    Vijayaraju, Nivetha
    Venkitaramanan, Deepthi
    Suresh, Greeshma
    Madhu, Divya
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2013, 4 (11) : 156 - 161
  • [33] Using Inverted Indexing to Semantic WEB Service Discovery Search Model
    Zhou, Bo
    Huang, Tinglei
    Liu, Jie
    Shen, Meizhou
    2009 5TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-8, 2009, : 4872 - 4875
  • [34] An approach for document retrieval using cluster-based inverted indexing
    Chandwani, Gunjan
    Ahlawat, Anil
    Dubey, Gaurav
    JOURNAL OF INFORMATION SCIENCE, 2023, 49 (03) : 726 - 739
  • [35] A Real-Time Translation of Arabic Video Contents to Arabic Sign Language
    Maghraby, Ashwag
    Qutub, Safaa
    Alandijani, Afraa
    Alandijani, Huda
    Bakhsh, Linah
    Alharbi, Nuha
    2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI 2021), 2021, : 1008 - 1013
  • [36] Inverted index and space mapping based redundancies eliminating for data blocking in entity resolution
    Tan, Mingchao
    Diao, Xingchun
    Cao, Jianjun
    Zhou, Xing
    Liu, Yi
    Zheng, Qibin
    Journal of Computational Information Systems, 2015, 11 (17): : 6187 - 6198
  • [37] REAL-TIME IMAGE SUPER RESOLUTION USING AN FPGA
    Bowen, Oliver
    Bouganis, Christos-Savvas
    2008 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE AND LOGIC APPLICATIONS, VOLS 1 AND 2, 2008, : 89 - 94
  • [38] A real-time Arabic avatar for deaf–mute community using attention mechanism
    Diana T. Mosa
    Nada A. Nasef
    Mohamed A. Lotfy
    Amr A. Abohany
    Reham M. Essa
    Ahmed Salem
    Neural Computing and Applications, 2023, 35 : 21709 - 21723
  • [39] Real-time structural motif searching in proteins using an inverted index strategy
    Bittrich, Sebastian
    Burley, Stephen K.
    Rose, Alexander S.
    PLOS COMPUTATIONAL BIOLOGY, 2020, 16 (12)
  • [40] Real time stabilisation of a triple link inverted pendulum using single control input
    Eltohamy, KG
    Kuo, CY
    IEE PROCEEDINGS-CONTROL THEORY AND APPLICATIONS, 1997, 144 (05): : 498 - 504