Dynamic Indexing for Incremental Entity Resolution in Data Integration Systems

被引:0
|
作者
Vieira, Priscilla Kelly M. [1 ,2 ]
Loscio, Bernadette Farias [1 ]
Salgado, Ana Carolina [1 ]
机构
[1] Univ Fed Pernambuco, Ctr Informat, Recife, PE, Brazil
[2] Univ Fed Rural Pernambuco, Recife, PE, Brazil
关键词
Data Integration; Entity Resolution; Data Matching; Duplicate Detection; Indexing;
D O I
10.5220/0006251801850192
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Entity Resolution (ER) is the problem of identifying groups of tuples from one or multiple data sources that represent the same real-world entity. This is a crucial stage of data integration processes, which often need to integrate data at query time. This task becomes even more challenging in scenarios with dynamic data sources or with a large volume of data. As most ER techniques deal with all tuples at once, new solutions have been proposed to deal with large volumes of data. One possible approach consists in performing the ER process on query results rather than the whole data set. It is also possible to reuse previous results of ER tasks in order to reduce the number of comparisons between pairs of tuples at query time. In a similar way, indexing techniques can also be employed to help the identification of equivalent tuples and to reduce the number of comparisons between pairs of tuples. In this context, this work proposes an indexing technique for incremental Entity Resolution processes. The expected contributions of this work are the specification, the implementation and the evaluation of the proposed indexes. We performed some experiments and the time spent for storing, accessing and updating the indexes was measured. We concluded that the reuse turns the ER process more efficient than the reprocessing of tuples comparison and with similar quality of results.
引用
收藏
页码:185 / 192
页数:8
相关论文
共 50 条
  • [1] Incremental entity resolution process over query results for data integration systems
    Machado Vieira, Priscilla Kelly
    Loscio, Bernadette Farias
    Salgado, Ana Carolina
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2019, 52 (02) : 451 - 471
  • [2] Incremental entity resolution process over query results for data integration systems
    Priscilla Kelly Machado Vieira
    Bernadette Farias Lóscio
    Ana Carolina Salgado
    Journal of Intelligent Information Systems, 2019, 52 : 451 - 471
  • [3] Incremental entity resolution on rules and data
    Whang, Steven Euijong
    Garcia-Molina, Hector
    VLDB JOURNAL, 2014, 23 (01): : 77 - 102
  • [4] Incremental entity resolution on rules and data
    Steven Euijong Whang
    Hector Garcia-Molina
    The VLDB Journal, 2014, 23 : 77 - 102
  • [5] A Strategy for Selecting Relevant Attributes for Entity Resolution in Data Integration Systems
    Canalle, Gabrielle Karine
    Loscio, Bernadette Farias
    Salgado, Ana Carolina
    ICEIS: PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS - VOL 1, 2017, : 80 - 88
  • [6] Dynamic Data Retrieval Using Incremental Clustering and Indexing
    Priya, Uma D.
    Thilagam, Santhi P.
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2020, 10 (03) : 74 - 91
  • [7] Dynamic Sorted Neighborhood Indexing for Real-Time Entity Resolution
    Ramadan, Banda
    Christen, Peter
    Liang, Huizhi
    DATABASES THEORY AND APPLICATIONS, ADC 2014, 2014, 8506 : 1 - 12
  • [8] Dynamic Sorted Neighborhood Indexing for Real-Time Entity Resolution
    Ramadan, Banda
    Christen, Peter
    Liang, Huizhi
    Gayler, Ross W.
    ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2015, 6 (04):
  • [9] Incremental Blocking for Entity Resolution over Web Streaming Data
    Araujo, Tiago Brasileiro
    Stefanidis, Kostas
    Santos Pires, Carlos Eduardo
    Nummenmaa, Jyrki
    da Nobrega, Thiago Pereira
    2019 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2019), 2019, : 332 - 336
  • [10] Multi-attribute Data Indexing for Query Based Entity Resolution
    Sun C.-C.
    Shen D.-R.
    Xiao Y.-Y.
    Li Y.-K.
    Ruan Jian Xue Bao/Journal of Software, 2022, 33 (06): : 2331 - 2347