Large-scale Complex Analytics on Semi-structured Datasets using AsterixDB and Spark

被引:5
|
作者
Alkowaileet, Wail Y. [1 ,2 ]
Alsubaiee, Sattam [1 ,2 ]
Carey, Michael J. [3 ,4 ]
Westmann, Till [4 ]
Bu, Yingyi [4 ]
机构
[1] KACST, Ctr Complex Engn Syst, Riyadh, Saudi Arabia
[2] MIT, Cambridge, MA 02139 USA
[3] Univ Calif Irvine, Irvine, CA USA
[4] Couchbase, Mountain View, CA USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2016年 / 9卷 / 13期
关键词
D O I
10.14778/3007263.3007315
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Large quantities of raw data are being generated by many different sources in different formats. Private and public sectors alike acclaim the valuable information and insights that can be mined from such data to better understand the dynamics of everyday life, such as traffic, worldwide logistics, and social behavior. For this reason, storing, managing, and analyzing "Big Data" at scale is getting a tremendous amount of attention, both in academia and industry. In this paper, we demonstrate the power of a parallel connection that we have built between Apache Spark and Apache AsterixDB (Incubating) to enable complex analytics such as machine learning and graph analysis on data drawn from large semi-structured data collections. The integration of these two systems allows researchers and data scientists to leverage AsterixDB capabilities, including fast ingestion and indexing of semi-structured data and efficient answering of geo-spatial and fuzzy text queries. Complex data analytics can then be performed on the resulting AsterixDB query output in order to obtain additional insights by leveraging the power of Spark's machine learning and graph libraries.
引用
收藏
页码:1585 / 1588
页数:4
相关论文
共 50 条
  • [1] EventDB: A Large-Scale Semi-structured Scientific Data Management System
    Zhao, Wenjia
    Qi, Yong
    Hou, Di
    Wang, Peijian
    Gao, Xin
    Du, Zirong
    Zhang, Yudong
    Zong, Yongfang
    [J]. BIG SCIENTIFIC DATA MANAGEMENT, 2019, 11473 : 105 - 115
  • [2] Towards algorithmic analytics for large-scale datasets
    Bzdok, Danilo
    Nichols, Thomas E.
    Smith, Stephen M.
    [J]. NATURE MACHINE INTELLIGENCE, 2019, 1 (07) : 296 - 306
  • [3] Towards algorithmic analytics for large-scale datasets
    Danilo Bzdok
    Thomas E. Nichols
    Stephen M. Smith
    [J]. Nature Machine Intelligence, 2019, 1 : 296 - 306
  • [4] Distributed poly-square mapping for large-scale semi-structured quad mesh generation
    Liu, Celong
    Yu, Wuyi
    Chen, Zhonggui
    Li, Xin
    [J]. COMPUTER-AIDED DESIGN, 2017, 90 : 5 - 17
  • [5] Clustering Heterogeneous Semi-Structured Social Science Datasets
    Skillicorn, D. B.
    Leuprecht, C.
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2015 COMPUTATIONAL SCIENCE AT THE GATES OF NATURE, 2015, 51 : 2908 - 2912
  • [6] Efficient Processing of Recursive Joins on Large-Scale Datasets in Spark
    Thuong-Cang Phan
    Anh-Cang Phan
    Thi-To-Quyen Tran
    Ngoan-Thanh Trieu
    [J]. ADVANCED COMPUTATIONAL METHODS FOR KNOWLEDGE ENGINEERING (ICCSAMA 2019), 2020, 1121 : 391 - 402
  • [7] Topic Segmentation of Semi-structured and Unstructured Conversational Datasets Using Language Models
    Ghosh, Reshmi
    Kajal, Harjeet Singh
    Kamath, Sharanya
    Shrivastava, Dhuri
    Basu, Samyadeep
    Zeng, Hansi
    Srinivasan, Soundararajan
    [J]. INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 4, INTELLISYS 2023, 2024, 825 : 91 - 104
  • [8] Hierarchical Event Descriptors (HED) Semi-Structured Tagging for Real-World Events in Large-Scale EEG
    Bigdely-Shamlol, Nima
    Cockfield, Jeremy
    Makeig, Scott
    Rognon, Thomas
    La Vallee, Chris
    Miyakoshi, Makoto
    Robbins, Kay A.
    [J]. FRONTIERS IN NEUROINFORMATICS, 2016, 10
  • [9] JSON']JSON Tiles: Fast Analytics on Semi-Structured Data
    Durner, Dominik
    Leis, Viktor
    Neumann, Thomas
    [J]. SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 445 - 458
  • [10] Predictive Analytics for Semi-structured Case Oriented Business Processes
    Lakshmanan, Geetika T.
    Duan, Songyun
    Keyser, Paul T.
    Curbera, Francisco
    Khalaf, Rania
    [J]. BUSINESS PROCESS MANAGEMENT WORKSHOPS, 2011, 66 : 640 - 651