Citus: Distributed PostgreSQL for Data-Intensive Applications

被引:10
|
作者
Cubukcu, Umur [1 ]
Erdogan, Ozgun [1 ]
Pathak, Sumedh [1 ]
Sannakkayala, Sudhakar [1 ]
Slot, Marco [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
关键词
postgresql; distributed database; relational database; database extension;
D O I
10.1145/3448016.3457551
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Citus is an open source distributed database engine for PostgreSQL that is implemented as an extension. Citus gives users the ability to distribute data, queries, and transactions in PostgreSQL across a cluster of PostgreSQL servers to handle the needs of data-intensive applications. The development of Citus has largely been driven by conversations with companies looking to scale PostgreSQL beyond a single server and their workload requirements. This paper describes the requirements of four common workload patterns and how Citus addresses those requirements. It also shares benchmark results demonstrating the performance and scalability of Citus in each of the workload patterns and describes how Microsoft uses Citus to address one of its most challenging data problems.
引用
收藏
页码:2490 / 2502
页数:13
相关论文
共 50 条
  • [1] Understanding performance of distributed data-intensive applications
    Miceli, Christopher
    Miceli, Michael
    Rodriguez-Milla, Bety
    Jha, Shantenu
    [J]. PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2010, 368 (1926): : 4089 - 4102
  • [2] CoLoc: Distributed Data and Container Colocation for Data-Intensive Applications
    Renner, Thomas
    Thamsen, Lauritz
    Kao, Odej
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 3008 - 3015
  • [3] NSM: A distributed storage architecture for data-intensive applications
    Ali, Z
    Malluhi, Q
    [J]. 20TH IEEE/11TH NASA GODDARD CONFERENCE ON MASS STORAGE AND TECHNOLOGIES (MSST 2003), PROCEEDINGS, 2003, : 87 - 91
  • [4] Decoupling computation and data scheduling in distributed data-intensive applications
    Ranganathan, K
    Foster, I
    [J]. 11TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, PROCEEDINGS, 2002, : 352 - 358
  • [5] MapReduce Across Distributed Clusters for Data-intensive Applications
    Wang, Lizhe
    Tao, Jie
    Marten, Holger
    Streit, Achim
    Khan, Samee U.
    Kolodziej, Joanna
    Chen, Dan
    [J]. 2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 2004 - 2011
  • [6] Open active services for data-intensive distributed applications
    Collet, C
    Vargas-Solar, G
    Grazziotin-Ribeiro, H
    [J]. 2000 INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM - PROCEEDINGS, 2000, : 349 - 359
  • [7] Supporting Load Balancing For Distributed Data-Intensive Applications
    Glimcher, Leonid
    Ravi, Vignesh T.
    Agrawal, Gagan
    [J]. 16TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), PROCEEDINGS, 2009, : 235 - 244
  • [8] Distributed Scientific Workflow Management for Data-Intensive Applications
    Shumilov, S.
    Leng, Y.
    El-Gayyar, M.
    Cremers, A. B.
    [J]. 12TH IEEE INTERNATIONAL WORKSHOP ON FUTURE TRENDS OF DISTRIBUTED COMPUTING SYSTEMS, PROCEEDINGS, 2008, : 65 - 73
  • [9] A distributed shared buffer space for data-intensive applications
    Lachaize, R
    Hansen, JS
    [J]. 2005 IEEE International Symposium on Cluster Computing and the Grid, Vols 1 and 2, 2005, : 913 - 920
  • [10] Distributed data structure templates for data-intensive remote sensing applications
    Ma, Yan
    Wang, Lizhe
    Liu, Dingsheng
    Yuan, Tao
    Liu, Peng
    Zhang, Wanfeng
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2013, 25 (12): : 1784 - 1797