Data matching method based on triangle inequality theorem

被引:1
|
作者
Wu Y.-P. [1 ]
Bao W.-D. [1 ]
Zhang W.-M. [1 ]
机构
[1] College of Information Systems and Management, National University of Defense Technology, Changsha 410073, Hunan
来源
Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science) | 2010年 / 38卷 / 07期
关键词
Data matching; Metrics space; Relative distance; Weak similarity;
D O I
10.3969/j.issn.1000-565X.2010.07.006
中图分类号
学科分类号
摘要
Data matching is an important research direction in database field. In this paper, a data matching method working in the metrics space is proposed, which classifies and matches data based on the triangle inequality theorem, and improves the matching efficiency by introducing a multiple iterative mechanism. Afterwards, the complexity of the method is analyzed and the efficiency of the method is verified by experiments. The results indicate that the proposed method makes full use of data characteristics, thus effectively improving the accuracy, correctness and recall rate of data matching.
引用
收藏
页码:33 / 38
页数:5
相关论文
共 10 条
  • [1] Hernandez M., Stolfo S., Real-world data is dirty: data cleansing and the merge/purge problem, Journal of Data Mining and Knowledge Discovery, 2, 1, pp. 9-37, (1998)
  • [2] Lee M.L., Lu H.-J., Wang L.T., Et al., Cleansing data for mining and warehousing, DEXA'99, pp. 751-760, (1999)
  • [3] Zhu X., Wu X., Chen S., Eliminating class noise in large datasets, Proceedings of the 20th ICML International Conference on Machine Learning, pp. 920-927, (2003)
  • [4] Monge A., Elkan C., The field-matching problem: algorithm and applications, Proc 2nd ACM SIGKDD Int'l Conf Knowledge Discovery and Data Mining, pp. 267-270, (1996)
  • [5] Newcombe H.B., Kennedy J.M., Record linkage: making maximum use of the discriminating power of identifying information, Commun ACM (CACM), 5, 11, pp. 563-566, (1962)
  • [6] Fellegi I.P., Sunter Alan B., A theory for record linkage, Journal of the American Statistical Association, 64, 328, pp. 1183-1210, (1969)
  • [7] Matthew A., Advances in record-linkage methodology as applied to matching the 1985 census of tampa, florida, Journal of the American Statistical Association, 84, 406, pp. 414-420, (1989)
  • [8] Hernandez Mauricio A., Stolfo Salvatore J., The merge/purge problem for large databases, SIGMOD'95, pp. 127-138, (1995)
  • [9] Su Y., Lee D., Kan M.Y., Et al., Adaptive sorted neighborhood methods for efficient record linkage, JCDL '07: Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries, pp. 185-194, (2007)
  • [10] Elmagarmid A.K., Ipeirotis P.G., Verykios V.S., Duplicate record detection: a survey, IEEE Transactions on Knowledge and Data Engineering, 19, 1, pp. 1-16, (2007)