A Method for Duplicate Record Detection Based on Decision Tree

被引:0
|
作者
Lin, Guangyan [1 ]
Qian, Yuxiang [1 ]
Zhang, Yiqiong [1 ]
机构
[1] Beihang Univ, Sch Software, Beijing, Peoples R China
关键词
Duplicate Detection; Decision Tree; Data Cleaning; Attribute Similarity; LINKAGE;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Duplicates is a common problem that widely bothers information systems. When computing similarity of two records, it will be time consuming and complex if comparing attributes one by one. This paper proposed a duplicate detection method based on decision tree. A conclusion of attribute similarity algorithms for common data types was made first. Based on the above, through mapping attribute similarity to decision tree nodes, that whether two records are duplicates or not can be determined in advance without computing entire attributes. At the same time of ensuring precision, the time complexity can be reduced significantly. The precision of experiments achieve above 98% and the F score 97%.
引用
收藏
页码:146 / 150
页数:5
相关论文
共 50 条
  • [31] Malicious Traffic Detection Method Based on Decision Tree-SNN Under Small Sample
    Li, Daoquan
    Li, Yuxiu
    Ren, Dayong
    Computer Engineering and Applications, 2023, 59 (21) : 258 - 266
  • [32] An Intrusion Detection Method Based on Decision Tree-Recursive Feature Elimination in Ensemble Learning
    Lian, Wenjuan
    Nie, Guoqing
    Jia, Bin
    Shi, Dandan
    Fan, Qi
    Liang, Yongquan
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
  • [33] EEG feature selection method based on decision tree
    Duan, Lijuan
    Ge, Hui
    Ma, Wei
    Miao, Jun
    BIO-MEDICAL MATERIALS AND ENGINEERING, 2015, 26 : S1019 - S1025
  • [34] An Evaluation Method of ATR Algorithm Based on Decision Tree
    Zhang, Yifei
    Zhou, Bin
    Dou, Hao
    Ming, Delie
    EIGHTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2016), 2016, 10033
  • [35] The Method of Vehicle Ontology Building Based on Decision Tree
    Ma, Bingxian
    Wang, Aixia
    Qu, Shouning
    2009 IITA INTERNATIONAL CONFERENCE ON SERVICES SCIENCE, MANAGEMENT AND ENGINEERING, PROCEEDINGS, 2009, : 503 - 506
  • [36] CSDTM - A cost sensitive decision tree based method
    Erray, Walid
    Hacid, Hakim
    ADVANCES IN INFORMATION SYSTEMS, PROCEEDINGS, 2006, 4243 : 217 - 226
  • [37] The Method to Determine Bibliographic Types Based on Decision Tree
    Geng, Si
    Li, Ning
    Zhao, Lin
    Tian, Ying'ai
    PROCEEDINGS OF THE 2016 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL & ELECTRONICS ENGINEERING AND COMPUTER SCIENCE (ICEEECS 2016), 2016, 50 : 755 - 761
  • [38] Method of Web Information Extraction Based on Decision Tree
    Chen Hong-ye
    2009 INTERNATIONAL FORUM ON INFORMATION TECHNOLOGY AND APPLICATIONS, VOL 1, PROCEEDINGS, 2009, : 664 - 666
  • [39] Link Prediction Method Based on Clustering and Decision Tree
    Yang N.
    Peng T.
    Liu L.
    Liu, Lu (liulu12@mails.jlu.edu.cn), 1795, Science Press (54): : 1795 - 1803
  • [40] Duplicate product record detection engine for e-commerce platforms
    Albayrak, Osman Semih
    Aytekin, Tevfik
    Kalayci, Tolga Ahmet
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 193