Certain Reduction Rules useful for De-duplication Algorithm of Indian Demographic Data

被引:0
|
作者
Kaushik, Vandana Dixit [1 ]
Bendale, Amit [2 ]
Nigam, Aditya [2 ]
Gupta, Phalguni [2 ]
机构
[1] Hartcourt Butler Technol Inst, Dept Comp Sci & Engn, Kanpur 208002, Uttar Pradesh, India
[2] Indian Inst Technol Kanpur, Dept Comp Sci Engn, Kanpur 208016, Uttar Pradesh, India
关键词
Demographic Information; De-duplication; Phonetics; Distance Matrix;
D O I
10.1109/ACCT.2014.85
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper proposes certain rules which helps to design efficient algorithm for de-duplication which is based on Indian demographic information that containing two name strings, viz. Given Name and Surname, of individuals. Rules help to reduce all name strings to generic name strings. A bin is formed by the generic name which contains all name strings and their Ids. Thus, the database with demographic information consists of an array of bins and each bin is represented by a singly linked list. At the time of query, top n best matches are determined by searching all neighbouring bins of the reduced query name strings. Performance of the rules has been analyzed on a large demographic database of 5,00,000 individuals. It is found that these proposed rules help to reduce the name strings by more than 90%.
引用
收藏
页码:79 / +
页数:3
相关论文
共 50 条
  • [31] Data De-duplication and Event Processing for Security Applications on an Embedded Processor
    Nagarajaiah, Harsha
    Upadhyaya, Shambhu
    Gopal, Vinodh
    2012 31ST INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS 2012), 2012, : 418 - 423
  • [32] De-duplication scheduling strategy in real-time data warehouse
    Liu, Hui
    Song, Jie
    Wu, Jin Bo
    Bao, Yu-Bin
    Open Cybernetics and Systemics Journal, 2015, 9 (01): : 37 - 43
  • [33] Data Storage Layout for Object-based De-duplication System
    Yan, Fang
    Tan, YuAn
    SENSORS, MEASUREMENT AND INTELLIGENT MATERIALS, PTS 1-4, 2013, 303-306 : 2284 - 2288
  • [34] DBSCAN-Based Automatic De-Duplication for Software Quality Inspection Data
    Cao, Chun-Hua
    Tang, Ya-Na
    Zhou, Hua
    Li, Yu-Li
    Marszalek, Zbigniew
    IEEE ACCESS, 2023, 11 : 17882 - 17890
  • [35] Large-Scale Data Management System Using Data De-duplication System
    Abirami, S.
    Vikraman, Rashmi
    Murugappan, S.
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 1, 2016, 379 : 225 - 234
  • [36] An Effective Data Storage Model for Cloud Databases using Temporal Data De-duplication Approach
    Muthurajkumar, S.
    Vijayalakshmi, M.
    Kannan, A.
    2016 EIGHTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2017, : 42 - 45
  • [37] Flexible yet Secure De-duplication Service for Enterprise Data on Cloud Storage
    Chuan, Wen Bing
    Ren, Shu Qin
    Keoh, Sye Loong
    Aung, Khin Mi Mi
    2015 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING RESEARCH AND INNOVATION (ICCCRI), 2015, : 37 - 44
  • [38] Energy-Efficient De-Duplication Mechanism for Healthcare Data Aggregation in IoT
    Khan, Muhammad Nafees Ulfat
    Cao, Weiping
    Tang, Zhiling
    Ullah, Ata
    Pan, Wanghua
    FUTURE INTERNET, 2024, 16 (02)
  • [39] An Undirected Graph Traversal based Grouping Prediction Method for Data De-duplication
    Wang, Longxiang
    Zhang, Xingjun
    Zhu, Guofeng
    Zhu, Yueguang
    Dong, Xiaoshe
    2013 14TH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD 2013), 2013, : 3 - 8
  • [40] Object-based data de-duplication method for OpenXML compound files
    School of Computer Science & Technology, Beijing Institute of Technology, Beijing
    100086, China
    不详
    101149, China
    Jisuanji Yanjiu yu Fazhan, 7 (1546-1557):