A Hybrid Data Cleaning Framework Using Markov Logic Networks (Extended Abstract)

被引:1
|
作者
Ge, Congcong [1 ]
Gao, Yunjun [1 ]
Miao, Xiaoye [2 ]
Yao, Bin [3 ]
Wang, Haobo [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci, Hangzhou, Peoples R China
[2] Zhejiang Univ, Ctr Data Sci, Hangzhou, Peoples R China
[3] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China
关键词
D O I
10.1109/ICDE51399.2021.00258
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the growth of dirty data, data cleaning turns into a crux of data analysis. In this paper, we propose a novel hybrid data cleaning framework, termed as MLNClean, which is capable of learning instantiated rules to supplement the insufficient integrity constraints. MLNClean consists of two steps, i.e., pre processing and two stage data cleaning. In the pre-processing step, MLNClean first infers a set of probable instantiated rules according to Markov logic network (MLN) and then builds a two-layer MLN index to generate multiple data versions and facilitate the cleaning process. In the two-stage data cleaning step, MLNClean first presents a concept of reliability score to clean errors within each data version separately, and then, it eliminates the conflict values among different data versions using a novel concept of fusion score. Considerable experimental results on both real and synthetic scenarios demonstrate the effectiveness of MLNClean.
引用
收藏
页码:2344 / 2345
页数:2
相关论文
共 50 条
  • [41] Analysis of Rachmaninoff's piano performances using inductive logic programming (extended abstract)
    Dovey, MJ
    MACHINE LEARNING: ECML-95, 1995, 912 : 279 - 282
  • [42] Knowledge Discovery from Constrained Relational Data: A Tutorial on Markov Logic Networks
    Spies, Marcus
    BUSINESS INTELLIGENCE, EBISS 2012, 2013, 138 : 78 - 102
  • [43] End-to-end Relation Extraction using Neural Networks and Markov Logic Networks
    Pawar, Sachin
    Bhattacharyya, Pushpak
    Palshikar, Girish K.
    15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 818 - 827
  • [44] Probabilistic Record Matching for Entity Resolution Using Markov Logic Networks
    Lukluk, Muhammad
    Affandi, Achmad
    Hariadi, Mochamad
    2018 ELECTRICAL POWER, ELECTRONICS, COMMUNICATIONS, CONTROLS, AND INFORMATICS SEMINAR (EECCIS), 2018, : 360 - 364
  • [45] End-to-End Relation Extraction Using Markov Logic Networks
    Pawar, Sachin
    Bhattacharya, Pushpak
    Palshikar, Girish K.
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, (CICLING 2016), PT II, 2018, 9624 : 535 - 551
  • [46] Qualitative spatial reasoning with uncertain evidence using Markov logic networks
    Duckham, Matt
    Gabela, Jelena
    Kealy, Allison
    Kyprianou, Ross
    Legg, Jonathan
    Moran, Bill
    Rumi, Shakila Khan
    Salim, Flora D.
    Tao, Yaguang
    Vasardani, Maria
    INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2023, 37 (09) : 2067 - 2100
  • [47] Enhancement and cleaning of handwritten data by using neural networks
    Hidalgo, JL
    España, S
    Castro, MJ
    Pérez, JA
    PATTERN RECOGNITION AND IMAGE ANALYSIS, PT 1, PROCEEDINGS, 2005, 3522 : 376 - 383
  • [48] Situated incremental natural language understanding using Markov Logic Networks
    Kennington, Casey
    Schlangen, David
    COMPUTER SPEECH AND LANGUAGE, 2014, 28 (01): : 240 - 255
  • [50] Confidence Levels for Empirical Research Using Twitter Data Extended Abstract
    Xu, Heng
    Zhang, Nan
    PROCEEDINGS OF THE TECHNOLOGY, MIND, AND SOCIETY CONFERENCE (TECHMINDSOCIETY'18), 2018,