Octopus: Hybrid Big Data Integration Engine

被引:7
|
作者
Chen, Yanjie [1 ]
Xu, Chenyang [1 ]
Liu, Qin [1 ]
Rao, Weixiong [1 ]
Min, Hong [2 ]
Su, Gong [2 ]
机构
[1] Tongji Univ, Sch Software Engn, Shanghai, Peoples R China
[2] IBM Watson Res Lab, New York, NY USA
关键词
D O I
10.1109/CloudCom.2015.111
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Nowadays large enterprises maintain a huge amount of data in multiple backend systems including traditional database systems and recently popular big data systems. In an example of telecom providers, the key business data (e.g., billing information) is maintained in database systems whereas the huge signaling log data is on HDFS with Hive. How to integrate such data and provide a consolidate query and analytic becomes a challenging task. Neither traditional database warehouse nor recent Big Data system (e.g. Apache Spark and Hadoop) can fully leverage the power of each backend system. In this paper, we build a hybrid data processing engine, called Octopus, to fully integrate backend systems. Given the backend systems, data is distributed at multiple locations. Octopus focuses on the optimization of the amount of data movement. To this end, Octopus proposes a technique of query pushdown for such optimization. A proof-of-concept prototype of Octopus successfully verifies that Octopus can achieve much faster running time than Spark. For example, Octopus outperforms the recent Spark version 1.4.0 by 5.25 x faster running time to process an aggregation query.
引用
收藏
页码:462 / 466
页数:5
相关论文
共 50 条
  • [1] Big Data Service Engine (BISE): Integration of Big Data Technologies for Human Centric Wellness Data
    Idris, Muhammad
    Hussain, Shujaat
    Ahmad, Mahmood
    Lee, Sungyoung
    [J]. 2015 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2015, : 244 - 248
  • [2] Big Data Integration: The Big Promise of Data Integration
    Gal, Avigdor
    [J]. 2015 3RD INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD (FICLOUD) AND INTERNATIONAL CONFERENCE ON OPEN AND BIG (OBD), 2015, : XLIV - XLIV
  • [3] Big Data Integration
    Dong, Xin Luna
    Srivastava, Divesh
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (11): : 1188 - 1189
  • [4] Big Data Integration
    Dong, Xin Luna
    Srivastava, Divesh
    [J]. 2013 IEEE 29TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2013, : 1245 - 1248
  • [5] Big Data Integration
    Cudre-Mauroux, Philippe
    [J]. PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS CONTEL 2017, 2017, : 5 - 5
  • [6] Design of big data integration platform based on hybrid hierarchy architecture
    Nie, Wenyi
    Zhang, Quanjiang
    Ouyang, Zhiqiang
    Liu, Xingang
    [J]. 2021 IEEE 15TH INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING (BIGDATASE 2021), 2021, : 135 - 140
  • [7] Data Mining Engine based on Big Data
    Song, Guo
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON EDUCATION, MANAGEMENT, COMPUTER AND SOCIETY, 2016, 37 : 264 - 267
  • [8] A big data acquisition engine based on rule engine
    School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
    不详
    [J]. Xu, X.-B. (claudia@bupt.edu.cn), 1600, Beijing University of Posts and Telecommunications (20):
  • [9] AN ENTERPRISE ORIENTED VIEW ON THE CLOUD INTEGRATION APPROACHES - HYBRID CLOUD AND BIG DATA
    Palanimalai, Shanmugasundaram
    Paramasivam, Ilango
    [J]. BIG DATA, CLOUD AND COMPUTING CHALLENGES, 2015, 50 : 163 - 168
  • [10] Challenges of Data Integration and Interoperability in Big Data
    Kadadi, Anirudh
    Agrawal, Rajeev
    Nyamful, Christopher
    Atiq, Rahman
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014,