A Hybrid Data Cleaning Framework Using Markov Logic Networks (Extended Abstract)

被引:1
|
作者
Ge, Congcong [1 ]
Gao, Yunjun [1 ]
Miao, Xiaoye [2 ]
Yao, Bin [3 ]
Wang, Haobo [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci, Hangzhou, Peoples R China
[2] Zhejiang Univ, Ctr Data Sci, Hangzhou, Peoples R China
[3] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China
关键词
D O I
10.1109/ICDE51399.2021.00258
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the growth of dirty data, data cleaning turns into a crux of data analysis. In this paper, we propose a novel hybrid data cleaning framework, termed as MLNClean, which is capable of learning instantiated rules to supplement the insufficient integrity constraints. MLNClean consists of two steps, i.e., pre processing and two stage data cleaning. In the pre-processing step, MLNClean first infers a set of probable instantiated rules according to Markov logic network (MLN) and then builds a two-layer MLN index to generate multiple data versions and facilitate the cleaning process. In the two-stage data cleaning step, MLNClean first presents a concept of reliability score to clean errors within each data version separately, and then, it eliminates the conflict values among different data versions using a novel concept of fusion score. Considerable experimental results on both real and synthetic scenarios demonstrate the effectiveness of MLNClean.
引用
收藏
页码:2344 / 2345
页数:2
相关论文
共 50 条
  • [31] Linking Bank Clients using Graph Neural Networks Powered by Rich Transactional Data: Extended Abstract
    Shumovskaia, Valentina
    Fedyanin, Kirill
    Sukharev, Ivan
    Berestnev, Dmitry
    Panov, Maxim
    2020 IEEE 7TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2020), 2020, : 787 - 788
  • [32] A Generalized Framework for Preserving Both Privacy and Utility in Data Outsourcing (Extended Abstract)
    Xie, Shangyu
    Mohammady, Meisam
    Wang, Han
    Wang, Lingyu
    Vaidya, Jaideep
    Hong, Yuan
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 1549 - 1550
  • [33] Managing aging data using persistent views (extended abstract)
    Skyt, J
    Jensen, CS
    COOPERATIVE INFORMATION SYSTEMS, PROCEEDINGS, 2000, 1901 : 132 - 137
  • [34] Hybrid Framework for DBSCAN Algorithm Using Fuzzy Logic
    Beri, Saefia
    Kaur, Kamaljit
    2015 1ST INTERNATIONAL CONFERENCE ON FUTURISTIC TRENDS ON COMPUTATIONAL ANALYSIS AND KNOWLEDGE MANAGEMENT (ABLAZE), 2015, : 383 - 387
  • [35] HYBRID PATTERN-RECOGNITION USING MARKOV NETWORKS
    GREGOR, J
    THOMASON, MG
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1993, 15 (06) : 651 - 656
  • [36] Research on Human Error Risk Evaluation Using Extended Bayesian Networks with Hybrid Data
    Pan, Xing
    Zuo, Dujun
    Zhang, Wenjin
    Hu, Lunhu
    Wang, Huixiong
    Jiang, Jing
    Reliability Engineering and System Safety, 2021, 209
  • [37] Research on Human Error Risk Evaluation Using Extended Bayesian Networks with Hybrid Data
    Pan, Xing
    Zuo, Dujun
    Zhang, Wenjin
    Hu, Lunhu
    Wang, Huixiong
    Jiang, Jing
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2021, 209
  • [38] Data relaying with optimal resource management in wireless sensor networks (extended abstract)
    Benkoczi, R
    Hassanein, H
    Akl, S
    Tai, S
    LCN 2005: 30th Conference on Local Computer Networks, Proceedings, 2005, : 617 - 618
  • [39] High Throughput Secure MPC over Small Population in Hybrid Networks (Extended Abstract)
    Choudhury, Ashish
    Hegde, Aditya
    PROGRESS IN CRYPTOLOGY - INDOCRYPT 2020, 2020, 12578 : 832 - 855
  • [40] Deep Packet Inspection Using Message Passing Networks (Extended Abstract)
    Jain, Divya
    Lakshmi, K. Vasanta
    Shankar, Priti
    RECENT ADVANCES IN INTRUSION DETECTION, RAID 2008, 2008, 5230 : 419 - 420