Large-scale Entity Extraction and Probabilistic Record Linkage

被引:0
|
作者
Villanustre, Flavio [1 ]
机构
[1] LexisNexis Risk Solut, Reed Elsevier, Alpharetta, GA 30005 USA
关键词
Big Data; entity extraction; disambiguation; public data; identity management; record linking; identity fraud;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Large-scale entity extraction, disambiguation and linkage in Big Data can challenge the traditional methodologies developed over the last three decades. Entity linkage, in particular, is cornerstone for a wide spectrum of applications, such as Master Data Management, Data Warehousing, Social Graph Analytics, Fraud Detection and Identity Management. Traditional rules based heuristic methods usually don't scale properly, are language specific and require significant maintenance over time. This presentation will introduce the audience to the use of probabilistic record linkage, also known as specificity based linkage, on Big Data, to perform language independent large-scale entity extraction, resolution and linkage across diverse sources. The presentation also includes a live demonstration reviewing the different steps required during the data integration process (ingestion, profiling, parsing, cleansing, standardization and normalization), and show the basic concepts behind probabilistic record linkage on a real-world application using the open source big data platform, HPCC Systems [1] from LexisNexis.
引用
收藏
页码:85 / 93
页数:9
相关论文
共 50 条
  • [1] Hierarchical Linkage Clustering with Distributions of Distances for Large-Scale Record Linkage
    Ventura, Samuel L.
    Nugent, Rebecca
    [J]. PRIVACY IN STATISTICAL DATABASES, PSD 2014, 2014, 8744 : 283 - 298
  • [3] Large-Scale Entity Extraction from Enterprise Data
    Gupta, Rajeev
    Kondapally, Ranganath
    [J]. SECOND INTERNATIONAL CONFERENCE ON AIML SYSTEMS 2022, 2022,
  • [4] Large-scale entity representation learning for biomedical relationship extraction
    Saenger, Mario
    Leser, Ulf
    [J]. BIOINFORMATICS, 2021, 37 (02) : 236 - 242
  • [5] Mobile Access Record Resolution on Large-Scale Identifier-Linkage Graphs
    Shen Xin
    Yang, Hongxia
    Xian, Weizhao
    Ester, Martin
    Bu, Jiajun
    Wang, Zhongyao
    Wang, Can
    [J]. KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 886 - 894
  • [6] Large-Scale Collective Entity Matching
    Rastogi, Vibhor
    Dalvi, Nilesh
    Garofalakis, Minos
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 4 (04): : 208 - 218
  • [7] Probabilistic record linkage
    Sayers, Adrian
    Ben-Shlomo, Yoav
    Blom, Ashley W.
    Steele, Fiona
    [J]. INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2016, 45 (03) : 954 - 964
  • [8] Probabilistic queries in large-scale networks
    Pedone, F
    Duarte, NL
    Goulart, M
    [J]. DEPENDABLE COMPUTING: EDCC-4, PROCEEDINGS, 2002, 2485 : 209 - 226
  • [9] Towards Large-Scale Probabilistic OBDA
    Schoenfisch, Joerg
    Stuckenschmidt, Heiner
    [J]. SCALABLE UNCERTAINTY MANAGEMENT (SUM 2015), 2015, 9310 : 106 - 120
  • [10] Large-scale extraction of proteins
    Cunha, T
    Aires-Barros, R
    [J]. MOLECULAR BIOTECHNOLOGY, 2002, 20 (01) : 29 - 40