Large-scale Entity Extraction and Probabilistic Record Linkage

被引:0
|
作者
Villanustre, Flavio [1 ]
机构
[1] LexisNexis Risk Solut, Reed Elsevier, Alpharetta, GA 30005 USA
关键词
Big Data; entity extraction; disambiguation; public data; identity management; record linking; identity fraud;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Large-scale entity extraction, disambiguation and linkage in Big Data can challenge the traditional methodologies developed over the last three decades. Entity linkage, in particular, is cornerstone for a wide spectrum of applications, such as Master Data Management, Data Warehousing, Social Graph Analytics, Fraud Detection and Identity Management. Traditional rules based heuristic methods usually don't scale properly, are language specific and require significant maintenance over time. This presentation will introduce the audience to the use of probabilistic record linkage, also known as specificity based linkage, on Big Data, to perform language independent large-scale entity extraction, resolution and linkage across diverse sources. The presentation also includes a live demonstration reviewing the different steps required during the data integration process (ingestion, profiling, parsing, cleansing, standardization and normalization), and show the basic concepts behind probabilistic record linkage on a real-world application using the open source big data platform, HPCC Systems [1] from LexisNexis.
引用
下载
收藏
页码:85 / 93
页数:9
相关论文
共 50 条
  • [41] Information extraction system in large-scale web
    Hong, F
    Zhao, Z
    International Symposium on Communications and Information Technologies 2005, Vols 1 and 2, Proceedings, 2005, : 783 - 786
  • [42] Large-scale directional relationship extraction and resolution
    Giles, Cory B.
    Wren, Jonathan D.
    BMC BIOINFORMATICS, 2008, 9 (Suppl 9)
  • [43] ELSKE: Efficient Large-Scale Keyphrase Extraction
    Knittel, Johannes
    Koch, Steffen
    Ertl, Thomas
    PROCEEDINGS OF THE 21ST ACM SYMPOSIUM ON DOCUMENT ENGINEERING (DOCENG '21), 2021,
  • [44] Feature Extraction for Large-Scale Text Collections
    Gallagher, Luke
    Mallia, Antonio
    Culpepper, J. Shane
    Suel, Torsten
    Cambazoglu, B. Barla
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 3015 - 3022
  • [45] Thermodynamic linkage of large-scale ligand aggregation with receptor binding
    Maluf, Nasib Karl
    Yang, Teng-Chieh
    BIOPHYSICAL CHEMISTRY, 2011, 154 (2-3) : 82 - 89
  • [46] A study on the probabilistic record linkage and its application
    Choi, Yeonok
    Lee, Sangin
    KOREAN JOURNAL OF APPLIED STATISTICS, 2021, 34 (05) : 849 - 861
  • [47] A Probabilistic Record Linkage Model for Survival Data
    Hof, Michel H.
    Ravelli, Anita C.
    Zwinderman, Aeilko H.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (520) : 1504 - 1515
  • [48] Probabilistic Record Linkage for Disclosure Risk Assessment
    Shlomo, Natalie
    PRIVACY IN STATISTICAL DATABASES, PSD 2014, 2014, 8744 : 269 - 282
  • [49] PET: Probabilistic Estimating Tree for Large-Scale RFID Estimation
    Zheng, Yuanqing
    Li, Mo
    Qian, Chen
    31ST INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2011), 2011, : 37 - 46
  • [50] Probabilistic Diagnosis of Performance Faults in Large-Scale Parallel Applications
    Laguna, Ignacio
    Ahn, Dong H.
    de Supinski, Bronis R.
    Bagchi, Saurabh
    Gamblin, Todd
    PROCEEDINGS OF THE 21ST INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'12), 2012, : 213 - 222