Contextual Data Cleaning with Ontology Functional Dependencies

被引:2
|
作者
Zheng, Zheng [1 ]
Zheng, Longtao [2 ]
Alipourlangouri, Morteza [1 ]
Chiang, Fei [1 ]
Golab, Lukasz [3 ]
Szlichta, Jaroslaw [4 ]
Baskaran, Sridevi [1 ]
机构
[1] McMaster Univ, 1280 Main St, West Hamilton, ON L8S 4K1, Canada
[2] Univ Sci & Technol China, 96 JinZhai Rd, Hefei 230026, Anhui, Peoples R China
[3] Univ Waterloo, 200 Univ Ave W, Waterloo, ON N2L 3G1, Canada
[4] Ontario Tech Univ, 2000 Simcoe St N, Oshawa, ON L1G 0C, Canada
来源
关键词
Data cleaning; ontology functional dependencies; EFFICIENT DISCOVERY; MODEL;
D O I
10.1145/3524303
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Functional Dependencies define attribute relationships based on syntactic equality, and when used in data cleaning, they erroneously label syntactically different but semantically equivalent values as errors. We explore dependency-based data cleaning with Ontology Functional Dependencies (OFDs), which express semantic attribute relationships such as synonyms defined by an ontology. We study the theoretical foundations of OFDs, including sound and complete axioms and a linear-time inference procedure. We then propose an algorithm for discovering OFDs (exact ones and ones that hold with some exceptions) from data that uses the axioms to prune the search space. Toward enabling OFDs as data quality rules in practice, we study the problem of finding minimal repairs to a relation and ontology with respect to a set of OFDs. We demonstrate the effectiveness of our techniques on real datasets and show that OFDs can significantly reduce the number of false positive errors in data cleaning techniques that rely on traditional Functional Dependencies.
引用
收藏
页数:26
相关论文
共 50 条
  • [31] RECOGNITION OF FUNCTIONAL DEPENDENCIES USING METEOROLOGICAL DATA
    VAPNIK, VN
    ROMANOV, LN
    IZVESTIYA AKADEMII NAUK SSSR FIZIKA ATMOSFERY I OKEANA, 1978, 14 (02): : 131 - 137
  • [32] Mining relaxed functional dependencies from data
    Loredana Caruccio
    Vincenzo Deufemia
    Giuseppe Polese
    Data Mining and Knowledge Discovery, 2020, 34 : 443 - 477
  • [33] Ontologies and Functional Dependencies for Data Integration and Reconciliation
    Bakhtouchi, Abdeighani
    Bellatreche, Ladjel
    Ait-Ameur, Yamine
    ADVANCES IN CONCEPTUAL MODELING: RECENT DEVELOPMENTS AND NEW DIRECTIONS, 2011, 6999 : 98 - +
  • [34] Efficient Discovery of Functional Dependencies on Massive Data
    Wan, Xiaolong
    Han, Xixian
    Wang, Jinbao
    Li, Jianzhong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (01) : 107 - 121
  • [35] Approximate Temporal Functional Dependencies on Clinical Data
    Mantovani, Matteo
    2017 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2017, : 328 - 328
  • [36] Functional Dependencies Unleashed for Scalable Data Exchange
    Bonifati, Angela
    Ileana, Ioana
    Linardi, Michele
    28TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT (SSDBM) 2016), 2016,
  • [37] A functional dependencies checking method in relational data
    Zhong P.
    Li Z.-H.
    Chen Q.
    1600, Science Press (40): : 207 - 222
  • [38] Mining relaxed functional dependencies from data
    Caruccio, Loredana
    Deufemia, Vincenzo
    Polese, Giuseppe
    DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 34 (02) : 443 - 477
  • [39] Threshold Functional Dependencies for Time Series Data
    Ji, Mingyue
    Wei, Xiukun
    Miao, Dongjing
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2020, 2020, 12115 : 164 - 174
  • [40] Semi-Automatic Ontology Construction by Exploiting Functional Dependencies and Association Rules
    Cagliero, Luca
    Cerquitelli, Tania
    Garza, Paolo
    INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2011, 7 (02) : 1 - 22