Unsupervised learning blocking keys technique for indexing Arabic entity resolution

被引:0
|
作者
Marwah Alian
Arafat Awajan
Bandan Ramadan
机构
[1] Hashemite University,
[2] Princess Sumaya University for Technology,undefined
[3] Prince Sultan University,undefined
关键词
Arabic entity resolution; Learning keys; Indexing; Arabic datasets;
D O I
暂无
中图分类号
学科分类号
摘要
Attribute values in textual datasets are subjects of different types of errors due to the data entry processes such as typographical errors, pronunciation errors or dialects alterations. These errors make the entity resolution process more challenging. The iterative blocking indexing technique can be used for correcting this type of errors mainly in query access where the records are stored into more than one block. Blocking indexing technique selects a subset of object pairs saved in the same block for later detailed computation for similarity discarding other pairs in other blocks considering them as irrelevant. This work aims to solving such problems for Arabic texts. It proposes to adapt a specific model for learning blocking keys and analyze its performance for Arabic datasets. The resulted blocking keys are passed as blocking keys to Dynamic Aware Inverted Index (DySimII) that worked efficiently with Arabic datasets. The model is tested against a telephone book dataset that contains duplicates and errors in attribute values according to phonetic and typing errors. The results reach a matching accuracy of 84% for using learned keys with small number of corrupted attributes while the performance is declined with the increase of the number of corrupted attributes.
引用
收藏
页码:621 / 628
页数:7
相关论文
共 50 条
  • [21] A Blocking Scheme for Entity Resolution in the Semantic Web
    Costa, Gustavo de Assis
    Parente de Oliveira, Jose Maria
    IEEE 30TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS IEEE AINA 2016, 2016, : 1138 - 1145
  • [22] Semantic-Aware Blocking for Entity Resolution
    Wang, Qing
    Cui, Mingyuan
    Liang, Huizhi
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (01) : 166 - 180
  • [23] Efficient Spectral Neighborhood Blocking for Entity Resolution
    Shu, Liangcai
    Chen, Aiyou
    Xiong, Ming
    Meng, Weiyi
    IEEE 27TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2011), 2011, : 1067 - 1078
  • [24] Blocking and Filtering Techniques for Entity Resolution: A Survey
    Papadakis, George
    Skoutas, Dimitrios
    Thanos, Emmanouil
    Palpanas, Themis
    ACM COMPUTING SURVEYS, 2020, 53 (02)
  • [25] Semantic-Aware Blocking for Entity Resolution
    Wang, Qing
    Cui, Mingyuan
    Liang, Huizhi
    2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 1468 - 1469
  • [26] MFIBlocks: An effective blocking algorithm for entity resolution
    Kenig, Batya
    Gal, Avigdor
    INFORMATION SYSTEMS, 2013, 38 (06) : 908 - 926
  • [27] An Unsupervised Algorithm for Learning Blocking Schemes
    Kejriwal, Mayank
    Miranker, Daniel P.
    2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2013, : 340 - 349
  • [28] Overlapped Hashing: A Novel Scalable Blocking Technique for Entity Resolution in Big-Data Era
    Khalil, Rana
    Shawish, Ahmed
    Elzanfaly, Doaa
    INTELLIGENT COMPUTING, VOL 1, 2019, 858 : 427 - 441
  • [29] Unsupervised Entity Resolution on Multi-type Graphs
    Zhu, Linhong
    Ghasemi-Gol, Majid
    Szekely, Pedro
    Galstyan, Aram
    Knoblock, Craig A.
    SEMANTIC WEB - ISWC 2016, PT I, 2016, 9981 : 649 - 667
  • [30] Unsupervised Entity Resolution Method Based on Random Forest
    Xu, Wanying
    Sun, Chenchen
    Xu, Lei
    Chen, Wenyu
    Hou, Zhijiang
    WEB INFORMATION SYSTEMS AND APPLICATIONS (WISA 2021), 2021, 12999 : 372 - 382