Certain Reduction Rules useful for De-duplication Algorithm of Indian Demographic Data

被引:0
|
作者
Kaushik, Vandana Dixit [1 ]
Bendale, Amit [2 ]
Nigam, Aditya [2 ]
Gupta, Phalguni [2 ]
机构
[1] Hartcourt Butler Technol Inst, Dept Comp Sci & Engn, Kanpur 208002, Uttar Pradesh, India
[2] Indian Inst Technol Kanpur, Dept Comp Sci Engn, Kanpur 208016, Uttar Pradesh, India
关键词
Demographic Information; De-duplication; Phonetics; Distance Matrix;
D O I
10.1109/ACCT.2014.85
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper proposes certain rules which helps to design efficient algorithm for de-duplication which is based on Indian demographic information that containing two name strings, viz. Given Name and Surname, of individuals. Rules help to reduce all name strings to generic name strings. A bin is formed by the generic name which contains all name strings and their Ids. Thus, the database with demographic information consists of an array of bins and each bin is represented by a singly linked list. At the time of query, top n best matches are determined by searching all neighbouring bins of the reduced query name strings. Performance of the rules has been analyzed on a large demographic database of 5,00,000 individuals. It is found that these proposed rules help to reduce the name strings by more than 90%.
引用
收藏
页码:79 / +
页数:3
相关论文
共 50 条
  • [21] FBBM: A new backup method with data de-duplication capability
    Yang, Tianming
    Feng, Dan
    Liu, Jingning
    Wan, Yaping
    MUE: 2008 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND UBIQUITOUS ENGINEERING, PROCEEDINGS, 2008, : 30 - +
  • [22] A method for organizing metadata of storage nodes with data de-duplication
    Wang, Guohua
    Zhao, Yuelong
    Li, Tianxiang
    Liao, Jinggui
    Journal of Computational Information Systems, 2014, 10 (09): : 3845 - 3854
  • [23] A data de-duplication access framework for solid state drives
    Department of Electronic Engineering, National Taiwan University of Science and Technology, Taipei, 106, Taiwan
    J. Inf. Sci. Eng., 2012, 5 (941-954):
  • [24] Semantic Analysis of Big Data by Applying De-duplication techniques
    Garg, Sanjeev
    Bala, Anju
    2016 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT), VOL 3, 2015, : 660 - 665
  • [25] A Data De-duplication Access Framework for Solid State Drives
    Wu, Chin-Hsien
    Wu, Hau-Shan
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2012, 28 (05) : 941 - 954
  • [26] An efficient technique for cloud storage using secured de-duplication algorithm
    Mohan, Prakash
    Sundaram, Manikandan
    Satpathy, Sambit
    Das, Sanchali
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (02) : 2969 - 2980
  • [27] GeoDD: End-to-End Spatial Data De-duplication System
    Trokhymovych, Mykola
    Kosovan, Oleksandr
    DATA SCIENCE AND ALGORITHMS IN SYSTEMS, 2022, VOL 2, 2023, 597 : 717 - 727
  • [28] De-Duplication Complexity of Fingerprint Data in Large-Scale Applications
    Nalla Pattabhi Ramaiah
    C.Krishna Mohan
    Journal of Electronic Science and Technology, 2014, (02) : 224 - 228
  • [29] Logical Data Deletion in High-Performance De-duplication Backup
    Yang, Tianming
    Tang, Zhen
    Wan, Yaping
    Sun, Wei
    MECHATRONICS AND INDUSTRIAL INFORMATICS, PTS 1-4, 2013, 321-324 : 2519 - +
  • [30] De-Duplication Complexity of Fingerprint Data in Large-Scale Applications
    Nalla Pattabhi Ramaiah
    C.Krishna Mohan
    JournalofElectronicScienceandTechnology, 2014, 12 (02) : 224 - 228