An Effective Framework for Enhancing Query Answering in a Heterogeneous Data Lake

被引:2
|
作者
Yuan, Qin [1 ]
Yuan, Ye [1 ]
Wen, Zhenyu [2 ]
Wang, He [1 ]
Tang, Shiyuan [1 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Zhejiang Univ Technol, Hangzhou, Peoples R China
基金
国家重点研发计划;
关键词
heterogeneous data lake; relational schema; query answering; SIMILARITY SEARCH;
D O I
10.1145/3539618.3591637
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
There has been a growing interest in cross-source searching to gain rich knowledge in recent years. A data lake collects massive raw and heterogeneous data with different data schemas and query interfaces. Many real-life applications require query answering over the heterogeneous data lake, such as e-commerce, bioinformatics and healthcare. In this paper, we propose LakeAns that semantically integrates heterogeneous data schemas of the lake to enhance the semantics of query answers. To this end, we propose a novel framework to efficiently and effectively perform the cross-source searching. The framework exploits a reinforcement learning method to semantically integrate the data schemas and further create a global relational schema for the heterogeneous data. It then performs a query answering algorithm based on the global schema to find answers across multiple data sources. We conduct extensive experimental evaluations using real-life data to verify that our approach outperforms existing solutions in terms of effectiveness and efficiency.
引用
收藏
页码:770 / 780
页数:11
相关论文
共 50 条
  • [1] A Unified Framework for Flexible Query Answering over Heterogeneous Data Sources
    De Virgilio, Roberto
    Maccioni, Antonio
    Torlone, Riccardo
    FLEXIBLE QUERY ANSWERING SYSTEMS 2015, 2016, 400 : 283 - 294
  • [2] OBDA Constraints for Effective Query Answering
    Hovland, Dag
    Lanti, Davide
    Rezk, Martin
    Xiao, Guohui
    RULE TECHNOLOGIES: RESEARCH, TOOLS, AND APPLICATIONS, 2016, 9718 : 269 - 286
  • [3] Query Answering On Uncertain Big RDF Data Using Apache Spark Framework
    Benbernou, Salima
    Ouziri, Mourad
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 4854 - 4860
  • [4] DC Proposal: Towards a Framework for Efficient Query Answering and Integration of Geospatial Data
    Schneider, Patrik
    SEMANTIC WEB - ISWC 2011, PT II, 2011, 7032 : 349 - 356
  • [5] Efficient and Effective Query Answering for Trajectory Cuboids
    Masciari, Elio
    FLEXIBLE QUERY ANSWERING SYSTEMS, 2011, 7022 : 270 - 281
  • [6] On the Data Complexity of Consistent Query Answering
    ten Cate, Balder
    Fontaine, Gaelle
    Kolaitis, Phokion G.
    THEORY OF COMPUTING SYSTEMS, 2015, 57 (04) : 843 - 891
  • [7] Lynx: A Graph Query Framework for Multiple Heterogeneous Data Sources
    Shen, Zhihong
    Hu, Chuan
    Zhao, Zihao
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (12): : 3926 - 3929
  • [8] Data exchange: semantics and query answering
    Fagin, R
    Kolaitis, PG
    Miller, RJ
    Popa, L
    THEORETICAL COMPUTER SCIENCE, 2005, 336 (01) : 89 - 124
  • [9] Data exchange: Semantics and query answering
    Fagin, R
    Kolaitis, PG
    Miller, RJ
    Popa, L
    DATABASE THEORY ICDT 2003, PROCEEDINGS, 2003, 2572 : 207 - 224
  • [10] Flexible query answering in data cubes
    Naouali, S
    Missaoui, R
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2005, 3589 : 221 - 232