Dynamic Sorted Neighborhood Indexing for Real-Time Entity Resolution

被引:0
|
作者
Ramadan, Banda [1 ]
Christen, Peter [1 ]
Liang, Huizhi [1 ]
机构
[1] Australian Natl Univ, Coll Engn & Comp Sci, Res Sch Comp Sci, Canberra, ACT 0200, Australia
关键词
Dynamic indexing; data matching; braided tree;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Real-time entity resolution is the process of matching query records in sub-second time with records in a database that represent the same real-world entity. Indexing techniques are used to efficiently extract a set of candidate records from the database that are similar to a query record, and that are then compared with the query record in more details. The sorted neighborhood indexing method, which sorts a database and compares records within a sliding window, has successfully been used for entity resolution of very large databases. However, because it is based on static sorted arrays, this technique is not suitable for dynamic databases. We propose a tree-based dynamic sorted neighborhood index that facilitates matching a stream of query records against a large and dynamic database in real-time. We evaluate our approach on two large data sets. Our results show that the times for both inserting and querying of records stays nearly constant as the index grows, and our approach achieves over one magnitude faster indexing and querying times compared to an earlier real-time entity resolution technique with comparable high matching accuracy.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [1] Dynamic Sorted Neighborhood Indexing for Real-Time Entity Resolution
    Ramadan, Banda
    Christen, Peter
    Liang, Huizhi
    Gayler, Ross W.
    ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2015, 6 (04):
  • [2] Arabic real time entity resolution using inverted indexing
    Alian, Marwah
    Al-Naymat, Ghazi
    Ramadan, Banda
    LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (04) : 921 - 941
  • [3] Arabic real time entity resolution using inverted indexing
    Marwah Alian
    Ghazi Al-Naymat
    Banda Ramadan
    Language Resources and Evaluation, 2020, 54 : 921 - 941
  • [4] Real-time Entity Resolution by Multiple Indices
    Zhu, Liang
    Cui, Rundong
    Ma, Qin
    Meng, Weiyi
    14TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND EDUCATION (ICCSE 2019), 2019, : 1063 - 1068
  • [5] Unsupervised Blocking Key Selection for Real-Time Entity Resolution
    Ramadan, Banda
    Christen, Peter
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PART II, 2015, 9078 : 574 - 585
  • [6] Keyword Search with Real-time Entity Resolution in Relational Databases
    Zhu, Liang
    Du, Xu
    Ma, Qin
    Meng, Weiyi
    Liu, Haibo
    PROCEEDINGS OF 2018 10TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING (ICMLC 2018), 2018, : 134 - 139
  • [7] Top-N Query Processing with Real-time Entity Resolution
    Zhu, Liang
    Fan, Shuaibing
    Ma, Qin
    Meng, Weiyi
    Liu, Haibo
    2017 EUROPEAN CONFERENCE ON ELECTRICAL ENGINEERING AND COMPUTER SCIENCE (EECS), 2017, : 236 - 241
  • [8] Dynamic Indexing for Incremental Entity Resolution in Data Integration Systems
    Vieira, Priscilla Kelly M.
    Loscio, Bernadette Farias
    Salgado, Ana Carolina
    ICEIS: PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS - VOL 1, 2017, : 185 - 192
  • [9] Adaptive Sorted Neighborhood Blocking for Entity Matching with MapReduce
    Mestre, Demetrio Gomes
    Pires, Carlos Eduardo
    Nascimento, Dimas C.
    30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 981 - 987
  • [10] Evaluating Top-N Join Queries with Real-time Entity Resolution
    Zhu, Liang
    Cheng, Ye
    Wang, Yu
    Ma, Qin
    Meng, Weiyi
    5TH ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND ARTIFICIAL INTELLIGENCE (ISAI2020), 2020, 1575