Certain Reduction Rules useful for De-duplication Algorithm of Indian Demographic Data

被引:0
|
作者
Kaushik, Vandana Dixit [1 ]
Bendale, Amit [2 ]
Nigam, Aditya [2 ]
Gupta, Phalguni [2 ]
机构
[1] Hartcourt Butler Technol Inst, Dept Comp Sci & Engn, Kanpur 208002, Uttar Pradesh, India
[2] Indian Inst Technol Kanpur, Dept Comp Sci Engn, Kanpur 208016, Uttar Pradesh, India
关键词
Demographic Information; De-duplication; Phonetics; Distance Matrix;
D O I
10.1109/ACCT.2014.85
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper proposes certain rules which helps to design efficient algorithm for de-duplication which is based on Indian demographic information that containing two name strings, viz. Given Name and Surname, of individuals. Rules help to reduce all name strings to generic name strings. A bin is formed by the generic name which contains all name strings and their Ids. Thus, the database with demographic information consists of an array of bins and each bin is represented by a singly linked list. At the time of query, top n best matches are determined by searching all neighbouring bins of the reduced query name strings. Performance of the rules has been analyzed on a large demographic database of 5,00,000 individuals. It is found that these proposed rules help to reduce the name strings by more than 90%.
引用
收藏
页码:79 / +
页数:3
相关论文
共 50 条
  • [41] A NOVEL APPROACH FOR SECURING DATA DE-DUPLICATION METHODOLOGY IN HYBRID CLOUD STORAGE
    Bhaskar, Kameswari
    Sathiyavathi, R.
    Jayashree, R.
    Gladence, L. Mary
    Anu, V. Maria
    2017 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2017,
  • [42] Towards De-duplication Framework in Big Data Analysis. A Case Study
    Maslankowski, Jacek
    INFORMATION SYSTEMS: DEVELOPMENT, RESEARCH, APPLICATIONS, EDUCATION, 2016, 264 : 104 - 113
  • [43] An enhanced secure content de-duplication identification and prevention (ESCDIP) algorithm in cloud environment
    Periasamy, J. K.
    Latha, B.
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (02): : 485 - 494
  • [44] An enhanced secure content de-duplication identification and prevention (ESCDIP) algorithm in cloud environment
    J. K. Periasamy
    B. Latha
    Neural Computing and Applications, 2020, 32 : 485 - 494
  • [45] Data De-Duplication Process and Authentication Using ERCE with Poisson Filter in Cloud Data Storage
    Venkatesan, B.
    Chitra, S.
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 34 (03): : 1603 - 1615
  • [46] Data Secure De-Duplication and Recovery Based on Public Key Encryption With Keyword Search
    Li, Le
    Zheng, Dong
    Zhang, Haoyu
    Qin, Baodong
    IEEE ACCESS, 2023, 11 : 28688 - 28698
  • [47] An Effective RAID Data Layout for Object-Based De-duplication Backup System
    Yan Fang
    Tan Yu'an
    Zhang Quanxin
    Wu Fei
    Cheng Zijing
    Zheng Jun
    CHINESE JOURNAL OF ELECTRONICS, 2016, 25 (05) : 832 - 840
  • [48] An Effective RAID Data Layout for Object-Based De-duplication Backup System
    YAN Fang
    TAN Yu'an
    ZHANG Quanxin
    WU Fei
    CHENG Zijing
    ZHENG Jun
    Chinese Journal of Electronics, 2016, 25 (05) : 832 - 840
  • [49] Fingerprinting Large Data Sets through Memory De-duplication Technique in Virtual Machines
    Owens, Rodney
    Wang, Weichao
    2011 - MILCOM 2011 MILITARY COMMUNICATIONS CONFERENCE, 2011, : 1363 - 1368
  • [50] R-dedup: A Performance Improvement Strategy for Fingerprint Calculation of Data De-Duplication
    Wang L.
    Dong K.
    Wang P.
    Dong X.
    Zhang X.
    Zhu Z.
    Zhang L.
    Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2021, 55 (01): : 43 - 51