Dynamic Indexing for Incremental Entity Resolution in Data Integration Systems

被引:0
|
作者
Vieira, Priscilla Kelly M. [1 ,2 ]
Loscio, Bernadette Farias [1 ]
Salgado, Ana Carolina [1 ]
机构
[1] Univ Fed Pernambuco, Ctr Informat, Recife, PE, Brazil
[2] Univ Fed Rural Pernambuco, Recife, PE, Brazil
来源
ICEIS: PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS - VOL 1 | 2017年
关键词
Data Integration; Entity Resolution; Data Matching; Duplicate Detection; Indexing;
D O I
10.5220/0006251801850192
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Entity Resolution (ER) is the problem of identifying groups of tuples from one or multiple data sources that represent the same real-world entity. This is a crucial stage of data integration processes, which often need to integrate data at query time. This task becomes even more challenging in scenarios with dynamic data sources or with a large volume of data. As most ER techniques deal with all tuples at once, new solutions have been proposed to deal with large volumes of data. One possible approach consists in performing the ER process on query results rather than the whole data set. It is also possible to reuse previous results of ER tasks in order to reduce the number of comparisons between pairs of tuples at query time. In a similar way, indexing techniques can also be employed to help the identification of equivalent tuples and to reduce the number of comparisons between pairs of tuples. In this context, this work proposes an indexing technique for incremental Entity Resolution processes. The expected contributions of this work are the specification, the implementation and the evaluation of the proposed indexes. We performed some experiments and the time spent for storing, accessing and updating the indexes was measured. We concluded that the reuse turns the ER process more efficient than the reprocessing of tuples comparison and with similar quality of results.
引用
收藏
页码:185 / 192
页数:8
相关论文
共 50 条
  • [21] Entity resolution for probabilistic data
    Ayat, Naser
    Akbarinia, Reza
    Afsarmanesh, Hamideh
    Valduriez, Patrick
    INFORMATION SCIENCES, 2014, 277 : 492 - 511
  • [22] Entity Resolution in the Web of Data
    Stefanidis, Kostas
    Efthymiou, Vasilis
    Herschel, Melanie
    Christophides, Vassilis
    WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, : 203 - 203
  • [23] Entity Resolution in the Web of Data
    Department of Computer Science, University of Crete, Greece
    不详
    不详
    Synth. lect. semant. web : theory technol., 3 (1-124):
  • [24] Entity Resolution in Dynamic Heterogeneous Networks
    Shekhar, Shubhranshu
    Pai, Deepak
    Ravindran, Sriram
    WWW'20: COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2020, 2020, : 662 - 668
  • [25] End-to-end Task Based Parallelization for Entity Resolution on Dynamic Data
    Gazzarri, Leonardo
    Herschel, Melanie
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 1248 - 1259
  • [26] Unsupervised learning blocking keys technique for indexing Arabic entity resolution
    Alian, Marwah
    Awajan, Arafat
    Ramadan, Bandan
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (03) : 621 - 628
  • [27] Unsupervised learning blocking keys technique for indexing Arabic entity resolution
    Marwah Alian
    Arafat Awajan
    Bandan Ramadan
    International Journal of Speech Technology, 2019, 22 : 621 - 628
  • [28] Research on incremental grouping based on transferred similarity in entity resolution
    Gao G.
    Xitong Gongcheng Lilun yu Shijian/System Engineering Theory and Practice, 2019, 39 (05): : 1287 - 1297
  • [29] Incremental Entity Blocking over Heterogeneous Streaming Data
    Araujo, Tiago Brasileiro
    Stefanidis, Kostas
    Santos Pires, Carlos Eduardo
    Nummenmaa, Jyrki
    da Nobrega, Thiago Pereira
    INFORMATION, 2022, 13 (12)
  • [30] Indexing Highly Dynamic Hierarchical Data
    Finis, Jan
    Brunel, Robert
    Kemper, Alfons
    Neumann, Thomas
    May, Norman
    Faerber, Franz
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (10): : 986 - 997