Certain Reduction Rules useful for De-duplication Algorithm of Indian Demographic Data

被引:0
|
作者
Kaushik, Vandana Dixit [1 ]
Bendale, Amit [2 ]
Nigam, Aditya [2 ]
Gupta, Phalguni [2 ]
机构
[1] Hartcourt Butler Technol Inst, Dept Comp Sci & Engn, Kanpur 208002, Uttar Pradesh, India
[2] Indian Inst Technol Kanpur, Dept Comp Sci Engn, Kanpur 208016, Uttar Pradesh, India
关键词
Demographic Information; De-duplication; Phonetics; Distance Matrix;
D O I
10.1109/ACCT.2014.85
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper proposes certain rules which helps to design efficient algorithm for de-duplication which is based on Indian demographic information that containing two name strings, viz. Given Name and Surname, of individuals. Rules help to reduce all name strings to generic name strings. A bin is formed by the generic name which contains all name strings and their Ids. Thus, the database with demographic information consists of an array of bins and each bin is represented by a singly linked list. At the time of query, top n best matches are determined by searching all neighbouring bins of the reduced query name strings. Performance of the rules has been analyzed on a large demographic database of 5,00,000 individuals. It is found that these proposed rules help to reduce the name strings by more than 90%.
引用
收藏
页码:79 / +
页数:3
相关论文
共 50 条
  • [1] A Web Page De-duplication Algorithm Based On Data Cleaning
    Lin, Jian-ming
    Liu, Dong-sheng
    Gao, Shi-wen
    Chen, Wei
    FIRST IITA INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, : 544 - +
  • [2] Optimization for data de-duplication algorithm based on file content
    Xuejun NIE
    Leihua QIN
    Jingli ZHOU
    Ke LIU
    Jianfeng ZHU
    Yu WANG
    Frontiers of Optoelectronics in China, 2010, 3 (03) : 308 - 316
  • [3] Application for data de-duplication algorithm based on mobile devices
    Xingchen, Ge
    Ning, Deng
    Jian, Yin
    Journal of Networks, 2013, 8 (11) : 2498 - 2505
  • [4] Optimization for data de-duplication algorithm based on file content
    Nie, Xuejun
    Qin, Leihua
    Zhou, Jingli
    Liu, Ke
    Zhu, Jianfeng
    Wang, Yu
    FRONTIERS OF OPTOELECTRONICS, 2010, 3 (03) : 308 - 316
  • [5] Secure Static Data De-duplication
    Pawar, Rohit
    Zanwar, Payal
    Bora, Shruti
    Kullkarni, Shweta
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2016, 16 (03): : 69 - 73
  • [6] A proficient cost reduction framework for de-duplication of records in data integration
    Asif Sohail
    Muhammad Murtaza Yousaf
    BMC Medical Informatics and Decision Making, 16
  • [7] A proficient cost reduction framework for de-duplication of records in data integration
    Sohail, Asif
    Yousaf, Muhammad Murtaza
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2016, 16
  • [8] User-aware de-duplication algorithm
    School of Computer, Wuhan University, Wuhan
    430072, China
    不详
    518219, China
    不详
    410000, China
    Ruan Jian Xue Bao, 10 (2581-2595):
  • [9] Research on Chunking Algorithms of Data De-duplication
    Bo, Cai
    Li, Zhang Feng
    Can, Wang
    PROCEEDINGS OF THE 2012 INTERNATIONAL CONFERENCE ON COMMUNICATION, ELECTRONICS AND AUTOMATION ENGINEERING, 2013, 181 : 1019 - 1025
  • [10] Data De-duplication on Similar File Detection
    Zhu, Yueguang
    Zhang, Xingjun
    Zhao, Runting
    Dong, Xiaoshe
    2014 EIGHTH INTERNATIONAL CONFERENCE ON INNOVATIVE MOBILE AND INTERNET SERVICES IN UBIQUITOUS COMPUTING (IMIS), 2014, : 66 - 73