Locality Sensitive Hashing with Temporal and Spatial Constraints for Efficient Population Record Linkage

被引:1
|
作者
Nanayakkara, Charini [1 ,2 ]
Christen, Peter [1 ,2 ]
机构
[1] Australian Natl Univ, Canberra, ACT, Australia
[2] Univ Edinburgh, Scottish Ctr Adm Data Res SCADR, Edinburgh, Midlothian, Scotland
基金
英国经济与社会研究理事会;
关键词
Scalability; personal data; spatial constraint; temporal constraint;
D O I
10.1145/3511808.3557631
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Record linkage is the process of identifying which records within or across databases refer to the same entity. Min-hash based Locality Sensitive Hashing (LSH) is commonly used in record linkage as a blocking technique to reduce the number of records to be compared. However, when applied on large databases, min-hash LSH can yield highly skewed block size distributions and many redundant record pair comparisons, where only few of those correspond to true matches (records that refer to the same entity). Furthermore, min-hash LSH is highly parameter sensitive and requires trial and error to determine the optimal trade-off between blocking quality and efficiency of the record pair comparison step. In this paper, we present a novel method to improve the scalability and robustness of min-hash LSH for linking large population databases by exploiting temporal and spatial information available in personal data, and by filtering record pairs based on block sizes and min-hash similarity. Our evaluation on three real-world data sets shows that our method can improve the efficiency of record pair comparison by 75% to 99%, whereas the final average linkage precision can be improved by 28% at the cost of a reduction in the average recall by 4%.
引用
收藏
页码:4354 / 4358
页数:5
相关论文
共 41 条
  • [1] Cryptographically Secure Private Record Linkage Using Locality-Sensitive Hashing
    Wei, Ruidi
    Kerschbaum, Florian
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 17 (02): : 79 - 91
  • [2] Efficient viideo retrieval by locality sensitive hashing
    Hu, SY
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 449 - 452
  • [3] Locality Sensitive Hashing for Efficient Similar Polygon Retrieval
    Kaplan, Haim
    Tenenbaum, Jay
    [J]. 38TH INTERNATIONAL SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE (STACS 2021), 2021, 187
  • [4] An Efficient Recommender System Using Locality Sensitive Hashing
    Zhang, Kunpeng
    Fan, Shaokun
    Wang, Harry Jiannan
    [J]. PROCEEDINGS OF THE 51ST ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS), 2018, : 780 - 789
  • [5] Diverse Yet Efficient Retrieval using Locality Sensitive Hashing
    Rao, Vidyadhar
    Jain, Prateek
    Jawahar, C. V.
    [J]. ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, : 189 - 196
  • [6] BitHash: An efficient bitwise Locality Sensitive Hashing method with applications
    Zhang, Wenhao
    Ji, Jianqiu
    Zhu, Jun
    Li, Jianmin
    Xu, Hua
    Zhang, Bo
    [J]. KNOWLEDGE-BASED SYSTEMS, 2016, 97 : 40 - 47
  • [7] Shuffle-Efficient Distributed Locality Sensitive Hashing on Spark
    Zhang, Wanxin
    Li, Dongsheng
    Xu, Ying
    Zhang, Yiming
    [J]. 2016 IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2016,
  • [8] EFFICIENT MANIFOLD LEARNING FOR SPEECH RECOGNITION USING LOCALITY SENSITIVE HASHING
    Tomar, Vikrant Singh
    Rose, Richard C.
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6995 - 6999
  • [9] Efficient Interaction-based Neural Ranking with Locality Sensitive Hashing
    Ji, Shiyu
    Shao, Jinjin
    Yang, Tao
    [J]. WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 2858 - 2864
  • [10] Locality-Sensitive Hashing for Efficient Rendezvous Search: A New Approach
    Jiang, Guann-Yng
    Chang, Cheng-Shang
    [J]. IEEE TRANSACTIONS ON COMMUNICATIONS, 2024, 72 (09) : 5674 - 5687