An Infrastructure for Automating Large-scale Performance Studies and Data Processing

被引：0

作者：

Jayasinghe, Deepal ^{[1
]}

Kimball, Josh ^{[1
]}

Zhu, Tao ^{[1
]}

Choudhary, Siddharth ^{[1
]}

Pu, Calton ^{[1
]}

机构：

[1] Georgia Inst Technol, Ctr Expt Res Comp Syst, Atlanta, GA 30332 USA

来源：

2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA | 2013年

关键词：

Automation; Benchmarking; Cloud; Code Generation; Data Warehouse; ETL; Performance; Visualization;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The Cloud has enabled the computing model to shift from traditional data centers to publicly shared computing infrastructure; yet, applications leveraging this new computing model can experience performance and scalability issues, which arise from the hidden complexities of the cloud. The most reliable path for better understanding these complexities is an empirically based approach that relies on collecting data from a large number of performance studies. Armed with this performance data, we can understand what has happened, why it happened, and more importantly, predict what will happen in the future. However, this approach presents challenges itself, namely in the form of data management. We attempt to mitigate these data challenges by fully automating the performance measurement process. Concretely, we have developed an automated infrastructure, which reduces the complexity of the large-scale performance measurement process by generating all the necessary resources to conduct experiments, to collect and process data and to store and analyze data. In this paper, we focus on the performance data management aspect of our infrastructure.

引用

页数：6

共 50 条

[21] An Efficient Strategy for Large-Scale CORS Data Processing
Xiong, Bolin
Huang, Dingfa
CHINA SATELLITE NAVIGATION CONFERENCE (CSNC) 2016 PROCEEDINGS, VOL I, 2016, 388 : 213 - 225
[22] Ten simple rules for large-scale data processing
Fungtammasan, Arkarachai
Lee, Alexandra
Taroni, Jaclyn
Wheeler, Kurt
Chin, Chen-Shan
Davis, Sean
Greene, Casey
PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (02)
[23] THE DESIGN OF DATA PROCESSING COMPILERS FOR LARGE-SCALE COMPUTERS
NUTT, R
SWIFT, CJ
COMMUNICATIONS OF THE ACM, 1963, 6 (07) : 360 - 360
[24] Review of large-scale RDF data processing in mapreduce
Hou, Ke
Zhang, Ming
Fang, Xing
Journal of Software Engineering, 2015, 9 (01): : 195 - 202
[25] DATA-PROCESSING IN LARGE-SCALE RESEARCH PROJECTS
FLANAGAN, JC
HARVARD EDUCATIONAL REVIEW, 1961, 31 (03) : 250 - 256
[26] The Family of MapReduce and Large-Scale Data Processing Systems
Sakr, Sherif
Liu, Anna
Fayoumi, Ayman G.
ACM COMPUTING SURVEYS, 2013, 46 (01)
[27] Optimizing data stream processing for large-scale applications
Cappellari, Paolo
Roantree, Mark
Chun, Soon Ae
SOFTWARE-PRACTICE & EXPERIENCE, 2018, 48 (09): : 1607 - 1641
[28] Hancock: A language for processing very large-scale data
Bonachea, D
Fisher, K
Rogers, A
Smith, F
USENIX ASSOCIATION PROCEEDINGS OF THE 2ND CONFERENCE ON DOMAIN-SPECIFIC LANGUAGES (DSL'99), 1999, : 163 - 176
[29] Large-scale human metabolomics studies: A strategy for data (pre-) processing and validation
Bijlsma, S
Bobeldijk, L
Verheij, ER
Ramaker, R
Kochhar, S
Macdonald, IA
van Ommen, B
Smilde, AK
ANALYTICAL CHEMISTRY, 2006, 78 (02) : 567 - 574
[30] Large-scale data processing software and performance instabilities within HEP grid environments
Datskova, Olga
Shi, Wedong
INTERNATIONAL JOURNAL OF GRID AND UTILITY COMPUTING, 2019, 10 (04) : 402 - 414

← 1 2 3 4 5 →