An Effective Framework for Enhancing Query Answering in a Heterogeneous Data Lake

被引:2
|
作者
Yuan, Qin [1 ]
Yuan, Ye [1 ]
Wen, Zhenyu [2 ]
Wang, He [1 ]
Tang, Shiyuan [1 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Zhejiang Univ Technol, Hangzhou, Peoples R China
基金
国家重点研发计划;
关键词
heterogeneous data lake; relational schema; query answering; SIMILARITY SEARCH;
D O I
10.1145/3539618.3591637
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
There has been a growing interest in cross-source searching to gain rich knowledge in recent years. A data lake collects massive raw and heterogeneous data with different data schemas and query interfaces. Many real-life applications require query answering over the heterogeneous data lake, such as e-commerce, bioinformatics and healthcare. In this paper, we propose LakeAns that semantically integrates heterogeneous data schemas of the lake to enhance the semantics of query answers. To this end, we propose a novel framework to efficiently and effectively perform the cross-source searching. The framework exploits a reinforcement learning method to semantically integrate the data schemas and further create a global relational schema for the heterogeneous data. It then performs a query answering algorithm based on the global schema to find answers across multiple data sources. We conduct extensive experimental evaluations using real-life data to verify that our approach outperforms existing solutions in terms of effectiveness and efficiency.
引用
收藏
页码:770 / 780
页数:11
相关论文
共 50 条
  • [41] Approximate Query Answering Using Data Warehouse Striping
    Jorge R. Bernardino
    Pedro S. Furtado
    Henrique C. Madeira
    Journal of Intelligent Information Systems, 2002, 19 : 145 - 167
  • [42] Query answering with transitive and linear-ordered data
    1600, AI Access Foundation (63):
  • [43] Cost Effective Framework for Complex and Heterogeneous Data Integration in Warehouse
    Amuthabala, P.
    Mohanapriya, M.
    SOFTWARE ENGINEERING PERSPECTIVES AND APPLICATION IN INTELLIGENT SYSTEMS, VOL 2, 2016, 465 : 93 - 104
  • [44] A Comprehensive Framework for Controlled Query Evaluation, Consistent Query Answering and KB Updates in Description Logics
    Lembo, Domenico
    Rosati, Riccardo
    Savo, Domenico Fabio
    SIXTEENTH INTERNATIONAL CONFERENCE ON PRINCIPLES OF KNOWLEDGE REPRESENTATION AND REASONING, 2018, : 653 - 654
  • [45] Medical data lake query assistance
    Abdelhedi, Fatma
    Jemmali, Rym
    Zurfluh, Gilles
    2023 20TH ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, AICCSA, 2023,
  • [46] Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources
    Begoli, Edmon
    Camacho-Rodriguez, Jesus
    Hyde, Julian
    Mior, Michael J.
    Lemire, Daniel
    SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 221 - 230
  • [47] CQFaRAD: Collaborative Query-Answering Framework for a Research Article Dataspace
    Singh M.
    Pandey S.
    Saxena R.
    Chaudhary M.
    Lal N.
    International Journal of Information Technology, 2024, 16 (3) : 1873 - 1886
  • [48] Data Cleaning and Query Answering with Matching Dependencies and Matching Functions
    Leopoldo Bertossi
    Solmaz Kolahi
    Laks V. S. Lakshmanan
    Theory of Computing Systems, 2013, 52 : 441 - 482
  • [49] Query answering in peer-to-peer data exchange systems
    Bertossi, L
    Bravo, L
    CURRENT TRENDS IN DATABASE TECHNOLOGY - EDBT 2004 WORKSHOPS, PROCEEDINGS, 2004, 3268 : 476 - 485
  • [50] A study on answering a data mining query using a materialized view
    Zakrzewicz, M
    Morzy, M
    Wojciechowski, M
    COMPUTER AND INFORMATION SCIENCES - ISCIS 2004, PROCEEDINGS, 2004, 3280 : 493 - 502