An Effective Framework for Enhancing Query Answering in a Heterogeneous Data Lake

被引:2
|
作者
Yuan, Qin [1 ]
Yuan, Ye [1 ]
Wen, Zhenyu [2 ]
Wang, He [1 ]
Tang, Shiyuan [1 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Zhejiang Univ Technol, Hangzhou, Peoples R China
基金
国家重点研发计划;
关键词
heterogeneous data lake; relational schema; query answering; SIMILARITY SEARCH;
D O I
10.1145/3539618.3591637
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
There has been a growing interest in cross-source searching to gain rich knowledge in recent years. A data lake collects massive raw and heterogeneous data with different data schemas and query interfaces. Many real-life applications require query answering over the heterogeneous data lake, such as e-commerce, bioinformatics and healthcare. In this paper, we propose LakeAns that semantically integrates heterogeneous data schemas of the lake to enhance the semantics of query answers. To this end, we propose a novel framework to efficiently and effectively perform the cross-source searching. The framework exploits a reinforcement learning method to semantically integrate the data schemas and further create a global relational schema for the heterogeneous data. It then performs a query answering algorithm based on the global schema to find answers across multiple data sources. We conduct extensive experimental evaluations using real-life data to verify that our approach outperforms existing solutions in terms of effectiveness and efficiency.
引用
收藏
页码:770 / 780
页数:11
相关论文
共 50 条
  • [21] Approximate Query Answering over Open Data
    Zhang, Mengqi
    Mundra, Pranay
    Chikweze, Chukwubuikem
    Nargesian, Fatemeh
    Weikum, Gerhard
    WORKSHOP ON HUMAN-IN-THE-LOOP DATA ANALYTICS, HILDA 2023, 2023,
  • [22] Data Complexity of Query Answering in Description Logics
    Calvanese, D.
    De Giacomo, G.
    Lembo, D.
    Lenzerini, M.
    Rosati, R.
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 4163 - 4167
  • [23] Smart Query Answering for Marine Sensor Data
    Shahriar, Md. Sumon
    de Souza, Paulo
    Timms, Greg
    SENSORS, 2011, 11 (03) : 2885 - 2897
  • [24] Data complexity of query answering in description logics
    Calvanese, Diego
    De Giacomo, Giuseppe
    Lembo, Domenico
    Lenzerini, Maurizio
    Rosati, Riccardo
    ARTIFICIAL INTELLIGENCE, 2013, 195 : 335 - 360
  • [25] Bandwidth-Efficient Query Answering in Semantically Heterogeneous Grids
    Li, Juan
    Su, Ying
    INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL SCIENCES AND OPTIMIZATION, VOL 1, PROCEEDINGS, 2009, : 37 - +
  • [26] An analytic framework for enhancing the performance of big heterogeneous data analysis
    Salama, Mohamed
    Kader, Hatem Abdul
    Abdelwahab, Amira
    INTERNATIONAL JOURNAL OF ENGINEERING BUSINESS MANAGEMENT, 2021, 13
  • [27] OPTIMA: Framework Selecting Optimal Virtual Model to Query Large Heterogeneous Data
    Belmehdi, Chahrazed B. Bachir
    Khiat, Abderrahmane
    Keskes, Nabil
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2022, 2022, 13428 : 209 - 215
  • [28] Neural-Symbolic Entangled Framework for Complex Query Answering
    Xu, Zezhong
    Zhang, Wen
    Ye, Peng
    Chen, Hui
    Chen, Huajun
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [29] View-based query answering and query containment over semistructured data
    Calvanese, D
    De Giacomo, G
    Lenzerini, M
    Vardi, MY
    DATABASE PROGRAMMING LANGUAGES, 2002, 2397 : 40 - 61
  • [30] Query Answering with Transitive and Linear-Ordered Data
    Amarilli, Antoine
    Benedikt, Michael
    Bourhis, Pierre
    Boom, Michael Vanden
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2018, 63 : 191 - 264