Certain Reduction Rules useful for De-duplication Algorithm of Indian Demographic Data

被引：0

作者：

Kaushik, Vandana Dixit ^{[1
]}

Bendale, Amit ^{[2
]}

Nigam, Aditya ^{[2
]}

Gupta, Phalguni ^{[2
]}

机构：

[1] Hartcourt Butler Technol Inst, Dept Comp Sci & Engn, Kanpur 208002, Uttar Pradesh, India

[2] Indian Inst Technol Kanpur, Dept Comp Sci Engn, Kanpur 208016, Uttar Pradesh, India

来源：

2014 FOURTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION TECHNOLOGIES (ACCT 2014) | 2014年

关键词：

Demographic Information; De-duplication; Phonetics; Distance Matrix;

D O I：

10.1109/ACCT.2014.85

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

This paper proposes certain rules which helps to design efficient algorithm for de-duplication which is based on Indian demographic information that containing two name strings, viz. Given Name and Surname, of individuals. Rules help to reduce all name strings to generic name strings. A bin is formed by the generic name which contains all name strings and their Ids. Thus, the database with demographic information consists of an array of bins and each bin is represented by a singly linked list. At the time of query, top n best matches are determined by searching all neighbouring bins of the reduced query name strings. Performance of the rules has been analyzed on a large demographic database of 5,00,000 individuals. It is found that these proposed rules help to reduce the name strings by more than 90%.

引用

页码：79 / +

页数：3

共 50 条

[31] Data De-duplication and Event Processing for Security Applications on an Embedded Processor
Nagarajaiah, Harsha
Upadhyaya, Shambhu
Gopal, Vinodh
2012 31ST INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS 2012), 2012, : 418 - 423
[32] De-duplication scheduling strategy in real-time data warehouse
Liu, Hui
Song, Jie
Wu, Jin Bo
Bao, Yu-Bin
Open Cybernetics and Systemics Journal, 2015, 9 (01): : 37 - 43
[33] Data Storage Layout for Object-based De-duplication System
Yan, Fang
Tan, YuAn
SENSORS, MEASUREMENT AND INTELLIGENT MATERIALS, PTS 1-4, 2013, 303-306 : 2284 - 2288
[34] DBSCAN-Based Automatic De-Duplication for Software Quality Inspection Data
Cao, Chun-Hua
Tang, Ya-Na
Zhou, Hua
Li, Yu-Li
Marszalek, Zbigniew
IEEE ACCESS, 2023, 11 : 17882 - 17890
[35] Large-Scale Data Management System Using Data De-duplication System
Abirami, S.
Vikraman, Rashmi
Murugappan, S.
PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 1, 2016, 379 : 225 - 234
[36] An Effective Data Storage Model for Cloud Databases using Temporal Data De-duplication Approach
Muthurajkumar, S.
Vijayalakshmi, M.
Kannan, A.
2016 EIGHTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2017, : 42 - 45
[37] Flexible yet Secure De-duplication Service for Enterprise Data on Cloud Storage
Chuan, Wen Bing
Ren, Shu Qin
Keoh, Sye Loong
Aung, Khin Mi Mi
2015 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING RESEARCH AND INNOVATION (ICCCRI), 2015, : 37 - 44
[38] Energy-Efficient De-Duplication Mechanism for Healthcare Data Aggregation in IoT
Khan, Muhammad Nafees Ulfat
Cao, Weiping
Tang, Zhiling
Ullah, Ata
Pan, Wanghua
FUTURE INTERNET, 2024, 16 (02)
[39] An Undirected Graph Traversal based Grouping Prediction Method for Data De-duplication
Wang, Longxiang
Zhang, Xingjun
Zhu, Guofeng
Zhu, Yueguang
Dong, Xiaoshe
2013 14TH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD 2013), 2013, : 3 - 8
[40] Object-based data de-duplication method for OpenXML compound files
School of Computer Science & Technology, Beijing Institute of Technology, Beijing
100086, China
不详
101149, China
Jisuanji Yanjiu yu Fazhan, 7 (1546-1557):

← 1 2 3 4 5 →