Unsupervised learning blocking keys technique for indexing Arabic entity resolution

被引:0
|
作者
Marwah Alian
Arafat Awajan
Bandan Ramadan
机构
[1] Hashemite University,
[2] Princess Sumaya University for Technology,undefined
[3] Prince Sultan University,undefined
关键词
Arabic entity resolution; Learning keys; Indexing; Arabic datasets;
D O I
暂无
中图分类号
学科分类号
摘要
Attribute values in textual datasets are subjects of different types of errors due to the data entry processes such as typographical errors, pronunciation errors or dialects alterations. These errors make the entity resolution process more challenging. The iterative blocking indexing technique can be used for correcting this type of errors mainly in query access where the records are stored into more than one block. Blocking indexing technique selects a subset of object pairs saved in the same block for later detailed computation for similarity discarding other pairs in other blocks considering them as irrelevant. This work aims to solving such problems for Arabic texts. It proposes to adapt a specific model for learning blocking keys and analyze its performance for Arabic datasets. The resulted blocking keys are passed as blocking keys to Dynamic Aware Inverted Index (DySimII) that worked efficiently with Arabic datasets. The model is tested against a telephone book dataset that contains duplicates and errors in attribute values according to phonetic and typing errors. The results reach a matching accuracy of 84% for using learned keys with small number of corrupted attributes while the performance is declined with the increase of the number of corrupted attributes.
引用
收藏
页码:621 / 628
页数:7
相关论文
共 50 条
  • [31] Dynamic Indexing for Incremental Entity Resolution in Data Integration Systems
    Vieira, Priscilla Kelly M.
    Loscio, Bernadette Farias
    Salgado, Ana Carolina
    ICEIS: PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS - VOL 1, 2017, : 185 - 192
  • [32] Unsupervised String Transformation Learning for Entity Consolidation
    Deng, Dong
    Tao, Wenbo
    Abedjan, Ziawasch
    Elmagarmid, Ahmed
    Ilyas, Ihab F.
    Li, Guoliang
    Madden, Samuel
    Ouzzani, Mourad
    Stonebraker, Michael
    Tang, Nan
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 196 - 207
  • [33] Comparative Analysis of Approximate Blocking Techniques for Entity Resolution
    Papadakis, George
    Svirsky, Jonathan
    Gal, Avigdor
    Palpanas, Themis
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (09): : 684 - 695
  • [34] Improved suffix blocking for record linkage and entity resolution
    Allam, Amin
    Skiadopoulos, Spiros
    Kalnis, Panos
    DATA & KNOWLEDGE ENGINEERING, 2018, 117 : 98 - 113
  • [35] A Type-Based Blocking Technique for Efficient Entity Resolution over Large-Scale Data
    Zhu, Hui-Juan
    Zhu, Zheng-Wei
    Jiang, Tong-Hai
    Cheng, Li
    Shi, Wei-Lei
    Zhou, Xi
    Zhao, Fan
    Ma, Bo
    JOURNAL OF SENSORS, 2018, 2018
  • [36] An unsupervised blocking technique for more efficient record linkage
    O'Hare, Kevin
    Jurek-Loughrey, Anna
    de Campos, Cassio
    DATA & KNOWLEDGE ENGINEERING, 2019, 122 (181-195) : 181 - 195
  • [37] Neural unsupervised learning technique
    Atiya, Amir F.
    Neural Networks, 1988, 1 (1 SUPPL)
  • [38] The Grouped Author-Topic Model for Unsupervised Entity Resolution
    Dai, Andrew M.
    Storkey, Amos J.
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2011, PT I, 2011, 6791 : 241 - 249
  • [39] Deep Learning Approach for Arabic Named Entity Recognition
    Gridach, Mourad
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, (CICLING 2016), PT I, 2018, 9623 : 439 - 451
  • [40] A Graph-Theoretic Fusion Framework for Unsupervised Entity Resolution
    Zhang, Dongxiang
    Guo, Long
    He, Xiangnan
    Shao, Jie
    Wu, Sai
    Shen, Heng Tao
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 713 - 724