LakeCompass: An End-to-End System for Data Maintenance, Search and Analysis in Data Lakes

被引:0
|
作者
Chai, Chengliang [1 ]
Deng, Yuhao [1 ]
Zhan, Yutong [1 ]
Cao, Ziqi [1 ]
Zhang, Yuanfang [1 ]
Cao, Lei [2 ]
Wang, Yuping [1 ]
Zhang, Zhiwei [1 ]
Yuan, Ye [1 ]
Wang, Guoren [1 ]
Tang, Nan [3 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Univ Arizona, MIT, Tempe, AZ USA
[3] HKUST, Guangzhou, Peoples R China
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2024年 / 17卷 / 12期
基金
国家重点研发计划;
关键词
D O I
10.14778/3685800.3685880
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Searching tables from poorly maintained data lakes has long been recognized as a formidable challenge in the realm of data management. There are three pivotal tasks: keyword-based, joinable and unionable table search, which form the backbone of tasks that aim to make sense of diverse datasets, such as machine learning. In this demo, we propose LakeCompass, an end-to-end prototype system that maintains abundant tabular data, supports all above search tasks with high efficacy, and well serves downstream ML modeling. To be specific, LakeCompass manages numerous real tables over which diverse types of indexes are built to support efficient search based on different user requirements. Particularly, LakeCompass could automatically integrate these discovered tables to improve the downstream model performance in an iterative approach. Finally, we provide both Python APIs and Web interface to facilitate flexible user interaction.
引用
收藏
页码:4381 / 4384
页数:4
相关论文
共 50 条
  • [41] Data networking: An end-to-end solution with Alcatel products
    Fang, R
    Hanson, R
    ALCATEL TELECOMMUNICATIONS REVIEW, 1996, (04): : 257 - 264
  • [42] Demonstration of End-to-End Automation of DNA Data Storage
    Christopher N. Takahashi
    Bichlien H. Nguyen
    Karin Strauss
    Luis Ceze
    Scientific Reports, 9
  • [43] Zettabyte Reliability with Flexible End-to-end Data Integrity
    Zhang, Yupu
    Myers, Daniel S.
    Arpaci-Dusseau, Andrea C.
    Arpaci-Dusseau, Remzi H.
    2013 IEEE 29TH SYMPOSIUM ON MASS STORAGE SYSTEMS AND TECHNOLOGIES (MSST), 2013,
  • [44] End-to-end 'data connectivity' management for multimedia networking
    Ravindran, K
    MANAGEMENT OF MULTIMEDIA NETWORKS AND SERVICES, PROCEEDINGS, 2005, 3754 : 190 - 203
  • [45] End-to-End Security Methods for UDT Data Transmissions
    Bernardo, Danilo Valeros
    Hoang, Doan B.
    FUTURE GENERATION INFORMATION TECHNOLOGY, 2010, 6485 : 383 - 393
  • [46] Selective End-To-End Data-Sharing in the Cloud
    Hoerandner, Felix
    Ramacher, Sebastian
    Roth, Simon
    INFORMATION SYSTEMS SECURITY (ICISS 2019), 2019, 11952 : 175 - 195
  • [47] End-to-End Privacy for Open Big Data Markets
    Perera, Charith
    Ranjan, Rajiv
    Wang, Lizhe
    IEEE CLOUD COMPUTING, 2015, 2 (04): : 44 - 53
  • [48] Secure end-to-end processing of smart metering data
    Andrey Brito
    Christof Fetzer
    Stefan Köpsell
    Peter Pietzuch
    Marcelo Pasin
    Pascal Felber
    Keiko Fonseca
    Marcelo Rosa
    Luiz Gomes
    Rodrigo Riella
    Charles Prado
    Luiz F. Rust
    Daniel E. Lucani
    Márton Sipos
    László Nagy
    Marcell Fehér
    Journal of Cloud Computing, 8
  • [49] An Overview of End-to-End Entity Resolution for Big Data
    Christophides, Vassilis
    Efthymiou, Vasilis
    Palpanas, Themis
    Papadakis, George
    Stefanidis, Kostas
    ACM COMPUTING SURVEYS, 2021, 53 (06)
  • [50] End-to-end Adversarial Sample Generation for Data Augmentation
    Liu, Tianyuan
    Sun, Yuqing
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11359 - 11368