Arabic real time entity resolution using inverted indexing

被引:2
|
作者
Alian, Marwah [1 ,3 ]
Al-Naymat, Ghazi [2 ,3 ]
Ramadan, Banda [4 ]
机构
[1] Hashemite Univ, Zarqa, Jordan
[2] Ajman Univ, Ajman, U Arab Emirates
[3] Princess Sumaya Univ Technol, Amman, Jordan
[4] Prince Sultan Univ, Riyadh, Saudi Arabia
关键词
Arabic Entity Resolution; Similarity Aware Inverted Indexes; Similarity functions; Record pair comparison;
D O I
10.1007/s10579-020-09504-6
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Arabic datasets that have two or more records for the same world entity (i.e. person, object, etc.) make institutions suffer from low quality and degraded performance due to duplication in their Arabic datasets without having any mechanism for detecting these duplicates. The operation that distinguishes records for the same real-world entity is called Entity Resolution (ER). It is considered as a tool for linking records across databases as well as for matching query records with existing databases in real-time. Indexing is a major step in the ER process that aims at reducing the search space. Several indexing techniques are available for use with the ER process in general for English Databases. However, such techniques are not validated if they work well with other languages, such as Arabic. The Dynamic Similarity Aware Inverted Index (DySimII) is one of the indexing techniques that are utilized with dynamic databases to match query records in real time and is demonstrated to work well with English language. In this paper, we propose a framework-Arabic Real Time Entity Resolution (ARTER)-that uses DySimII with Arabic databases to perform real time ER. We also examine using different string similarity functions required for comparing records in the matching process for the aim of evaluating which similarity function is more suitable for comparing Arabic strings. A real-world Arabic database is used to conduct our experimental evaluation where two stemmers and three similarity functions are used to see the effect on DySimII with Arabic dataset. The results represent that matching accuracy is improved using Asem stemmer when the number of corrupted attributes is increased, also testing the three similarity functions show that using winkler similarity function provides better matching accuracy while N-gram provides better results when used with Asem stemmer.
引用
收藏
页码:921 / 941
页数:21
相关论文
共 50 条
  • [21] Arabic Named Entity Recognition Using Boosting Method
    Sajadi, Mohamad Bagher
    Minaei, Behrooz
    2017 19TH CSI INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING (AISP), 2017, : 281 - 288
  • [22] Indexing moving objects: A real time approach
    Lagogiannis, George
    Lorentzos, Nikos
    Sideridis, Alexander B.
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2013, 10 (01) : 173 - 195
  • [23] Arabic Name Entity Recognition Using Deep Learning
    Awad, David
    Sabty, Caroline
    Elmahdy, Mohamed
    Abdennadher, Slim
    STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2018, 2018, 11171 : 105 - 116
  • [24] Indexing Arabic texts using association rule data mining
    Haraty, Ramzi A.
    Nasrallah, Rouba
    LIBRARY HI TECH, 2019, 37 (01) : 101 - 117
  • [25] Integrating Real-Time Entity Resolution with Top-N Join Query Processing
    Zhu, Liang
    Li, Xinfeng
    Wei, Yonggang
    Ma, Qin
    Meng, Weiyi
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT III, 2021, 12817 : 111 - 123
  • [26] An Entity Based RDF Indexing Schema Using Hadoop And HBase
    Abiri, Fateme
    Kahani, Mohsen
    Zarinkalam, Fatane
    2014 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2014, : 68 - 73
  • [27] Hierarchical Entity Resolution using an Oracle
    Galhotra, Sainyam
    Firmani, Donatella
    Saha, Barna
    Srivastava, Divesh
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 414 - 428
  • [28] Online Entity Resolution Using an Oracle
    Firmani, Donatella
    Saha, Barna
    Srivastava, Divesh
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (05): : 384 - 395
  • [29] Entity Resolution using Cloud Computing
    James, Alex
    Tauer, Gregory
    Czerniejewski, Adam
    Brown, Ryan M.
    Hartloff, Jesse
    Chaves, Jillian
    Sudit, Moises
    NEXT-GENERATION ANALYST III, 2015, 9499
  • [30] Arabic Named Entity Disambiguation Using Linked Open Data
    Al-Qawasmeh, Omar
    AL-Smadi, Mohammad
    Fraihat, Nisreen
    2016 7TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2016, : 333 - 338