An Infrastructure for Automating Large-scale Performance Studies and Data Processing

被引：0

作者：

Jayasinghe, Deepal ^{[1
]}

Kimball, Josh ^{[1
]}

Zhu, Tao ^{[1
]}

Choudhary, Siddharth ^{[1
]}

Pu, Calton ^{[1
]}

机构：

[1] Georgia Inst Technol, Ctr Expt Res Comp Syst, Atlanta, GA 30332 USA

来源：

2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA | 2013年

关键词：

Automation; Benchmarking; Cloud; Code Generation; Data Warehouse; ETL; Performance; Visualization;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The Cloud has enabled the computing model to shift from traditional data centers to publicly shared computing infrastructure; yet, applications leveraging this new computing model can experience performance and scalability issues, which arise from the hidden complexities of the cloud. The most reliable path for better understanding these complexities is an empirically based approach that relies on collecting data from a large number of performance studies. Armed with this performance data, we can understand what has happened, why it happened, and more importantly, predict what will happen in the future. However, this approach presents challenges itself, namely in the form of data management. We attempt to mitigate these data challenges by fully automating the performance measurement process. Concretely, we have developed an automated infrastructure, which reduces the complexity of the large-scale performance measurement process by generating all the necessary resources to conduct experiments, to collect and process data and to store and analyze data. In this paper, we focus on the performance data management aspect of our infrastructure.

引用

页数：6

共 50 条

[1] AUTOMATING LARGE-SCALE PROCESSING OF DOSIMETRY DATA
PAWLYK, DA
SIEGEL, JA
SHARKEY, RM
GOLDENBERG, DM
JOURNAL OF NUCLEAR MEDICINE, 1993, 34 (05) : P160 - P160
[2] Automating Large-Scale Data Quality Verification
Schelter, Sebastian
Lange, Dustin
Schmidt, Philipp
Celikel, Meltem
Biessmann, Felix
Grafberger, Andreas
PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (12): : 1781 - 1794
[3] A Large-Scale Web Data Collection as a Natural Language Processing Infrastructure
Shinzato, Keiji
Kawahara, Daisuke
Hashimoto, Chikara
Kurohashi, Sadao
SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 2236 - 2241
[4] Data consistency in a large-scale runtime infrastructure
Liu, BQ
Wang, HM
Yao, YP
Proceedings of the 2005 Winter Simulation Conference, Vols 1-4, 2005, : 1787 - 1794
[5] SNPP: automating large-scale SNP genotype data management
Zhao, LJ
Li, MX
Guo, YF
Xu, FH
Li, JL
Deng, HW
BIOINFORMATICS, 2005, 21 (02) : 266 - 268
[6] Large-Scale Simulator for Global Data Infrastructure Optimization
Herrero-Lopez, Sergio
Williams, John R.
Sanchez, Abel
2011 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2011, : 54 - 64
[7] Automating large-scale LEMUF calculations
Picard, R.R.
JNMM, Journal of the Institute of Nuclear Materials Management, 1992, 20 (03): : 43 - 46
[8] Active disks for large-scale data processing
Riedel, E
Faloutsos, C
Gibson, GA
Nagle, D
COMPUTER, 2001, 34 (06) : 68 - +
[9] Processing large-scale data with Apache Spark
Ko, Seyoon
Won, Joong-Ho
KOREAN JOURNAL OF APPLIED STATISTICS, 2016, 29 (06) : 1077 - 1094
[10] Common Data Elements, Scalable Data Management Infrastructure, and Analytics Workflows for Large-Scale Neuroimaging Studies
Kuplicki, Rayus
Touthang, James
Al Zoubi, Obada
Mayeli, Ahmad
Misaki, Masaya
Aupperle, Robin L.
Teague, T. Kent
McKinney, Brett A.
Paulus, Martin P.
Bodurka, Jerzy
FRONTIERS IN PSYCHIATRY, 2021, 12

← 1 2 3 4 5 →