Entity Resolution for Big Data

被引:0
|
作者
Getoor, Lise [1 ]
Machanavajjhala, Ashwin [2 ]
机构
[1] Univ Maryland, Comp Sci Dept, College Pk, MD 20742 USA
[2] Duke Univ, Dept Comp Sci, Durham, NC 27706 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Entity resolution (ER), the problem of extracting, matching and resolving entity mentions in structured and unstructured data, is a long-standing challenge in database management, information retrieval, machine learning, natural language processing and statistics. Accurate and fast entity resolution has huge practical implications in a wide variety of commercial, scientific and security domains. Despite the long history of work on entity resolution, there is still a surprising diversity of approaches, and lack of guiding theory. Meanwhile, in the age of big data, the need for high quality entity resolution is growing, as we are inundated with more and more data, all of which needs to be integrated, aligned and matched, before further utility can be extracted. In this tutorial, we bring together perspectives on entity resolution from a variety of fields, including databases, information retrieval, natural language processing and machine learning, to provide, in one setting, a survey of a large body of work. We discuss both the practical aspects and theoretical underpinnings of ER. We describe existing solutions, current challenges and open research problems. In addition to giving attendees a thorough understanding of existing ER models, algorithms and evaluation methods, the tutorial will cover important research topics such as scalable ER, active and lightly supervised ER, and query-driven ER.
引用
收藏
页码:1525 / 1525
页数:1
相关论文
共 50 条
  • [21] Entity resolution for distributed probabilistic data
    Ayat, Naser
    Akbarinia, Reza
    Afsarmanesh, Hamideh
    Valduriez, Patrick
    [J]. DISTRIBUTED AND PARALLEL DATABASES, 2013, 31 (04) : 509 - 542
  • [22] Incremental entity resolution on rules and data
    Steven Euijong Whang
    Hector Garcia-Molina
    [J]. The VLDB Journal, 2014, 23 : 77 - 102
  • [23] Entity deduplication in big data graphs for scholarly communication
    Manghi, Paolo
    Atzori, Claudio
    De Bonis, Michele
    Bardi, Alessia
    [J]. DATA TECHNOLOGIES AND APPLICATIONS, 2020, 54 (04) : 409 - 435
  • [24] Populating Entity Name Systems for Big Data Integration
    Kejriwal, Mayank
    [J]. SEMANTIC WEB - ISWC 2014, PT II, 2014, 8797 : 521 - 528
  • [25] Data Augmentation for Entity Resolution: A comparative evaluation
    Rettenmeier, Tobias
    Jesser, Alexander
    [J]. 2024 IEEE 3RD INTERNATIONAL CONFERENCE ON COMPUTING AND MACHINE INTELLIGENCE, ICMI 2024, 2024,
  • [26] The Five Generations of Entity Resolution on Web Data
    Nikoletos, Konstantinos
    Ioannou, Ekaterini
    Papadakis, George
    [J]. WEB ENGINEERING, ICWE 2024, 2024, 14629 : 469 - 473
  • [27] User and Entity Behavior Analysis under Urban Big Data
    Tian, Zhihong
    Luo, Chaochao
    Lu, Hui
    Su, Shen
    Sun, Yanbin
    Zhang, Man
    [J]. ACM/IMS Transactions on Data Science, 2020, 1 (03):
  • [28] Entity reconciliation in big data sources: A systematic mapping study
    Enriquez, J. G.
    Dominguez-Mayo, F. J.
    Escalona, M. J.
    Ross, M.
    Staples, G.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 80 : 14 - 27
  • [29] Modelling Entity Integrity for Semi-structured Big Data
    Litvinenko, Ilya
    Wei, Ziheng
    Link, Sebastian
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT I, 2021, 12681 : 113 - 120
  • [30] The Research and Design of Big Data Trading Platform for Entity Business
    Zou, Qian-Ying
    Luo, Lan
    [J]. PROCEEDINGS OF THE 3RD ANNUAL INTERNATIONAL CONFERENCE ON ELECTRONICS, ELECTRICAL ENGINEERING AND INFORMATION SCIENCE (EEEIS 2017), 2017, 131 : 519 - 524