An Infrastructure for Automating Large-scale Performance Studies and Data Processing

被引:0
|
作者
Jayasinghe, Deepal [1 ]
Kimball, Josh [1 ]
Zhu, Tao [1 ]
Choudhary, Siddharth [1 ]
Pu, Calton [1 ]
机构
[1] Georgia Inst Technol, Ctr Expt Res Comp Syst, Atlanta, GA 30332 USA
关键词
Automation; Benchmarking; Cloud; Code Generation; Data Warehouse; ETL; Performance; Visualization;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Cloud has enabled the computing model to shift from traditional data centers to publicly shared computing infrastructure; yet, applications leveraging this new computing model can experience performance and scalability issues, which arise from the hidden complexities of the cloud. The most reliable path for better understanding these complexities is an empirically based approach that relies on collecting data from a large number of performance studies. Armed with this performance data, we can understand what has happened, why it happened, and more importantly, predict what will happen in the future. However, this approach presents challenges itself, namely in the form of data management. We attempt to mitigate these data challenges by fully automating the performance measurement process. Concretely, we have developed an automated infrastructure, which reduces the complexity of the large-scale performance measurement process by generating all the necessary resources to conduct experiments, to collect and process data and to store and analyze data. In this paper, we focus on the performance data management aspect of our infrastructure.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] AUTOMATING LARGE-SCALE PROCESSING OF DOSIMETRY DATA
    PAWLYK, DA
    SIEGEL, JA
    SHARKEY, RM
    GOLDENBERG, DM
    JOURNAL OF NUCLEAR MEDICINE, 1993, 34 (05) : P160 - P160
  • [2] Automating Large-Scale Data Quality Verification
    Schelter, Sebastian
    Lange, Dustin
    Schmidt, Philipp
    Celikel, Meltem
    Biessmann, Felix
    Grafberger, Andreas
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (12): : 1781 - 1794
  • [3] A Large-Scale Web Data Collection as a Natural Language Processing Infrastructure
    Shinzato, Keiji
    Kawahara, Daisuke
    Hashimoto, Chikara
    Kurohashi, Sadao
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 2236 - 2241
  • [4] Data consistency in a large-scale runtime infrastructure
    Liu, BQ
    Wang, HM
    Yao, YP
    Proceedings of the 2005 Winter Simulation Conference, Vols 1-4, 2005, : 1787 - 1794
  • [5] SNPP: automating large-scale SNP genotype data management
    Zhao, LJ
    Li, MX
    Guo, YF
    Xu, FH
    Li, JL
    Deng, HW
    BIOINFORMATICS, 2005, 21 (02) : 266 - 268
  • [6] Large-Scale Simulator for Global Data Infrastructure Optimization
    Herrero-Lopez, Sergio
    Williams, John R.
    Sanchez, Abel
    2011 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2011, : 54 - 64
  • [7] Automating large-scale LEMUF calculations
    Picard, R.R.
    JNMM, Journal of the Institute of Nuclear Materials Management, 1992, 20 (03): : 43 - 46
  • [8] Active disks for large-scale data processing
    Riedel, E
    Faloutsos, C
    Gibson, GA
    Nagle, D
    COMPUTER, 2001, 34 (06) : 68 - +
  • [9] Processing large-scale data with Apache Spark
    Ko, Seyoon
    Won, Joong-Ho
    KOREAN JOURNAL OF APPLIED STATISTICS, 2016, 29 (06) : 1077 - 1094
  • [10] Common Data Elements, Scalable Data Management Infrastructure, and Analytics Workflows for Large-Scale Neuroimaging Studies
    Kuplicki, Rayus
    Touthang, James
    Al Zoubi, Obada
    Mayeli, Ahmad
    Misaki, Masaya
    Aupperle, Robin L.
    Teague, T. Kent
    McKinney, Brett A.
    Paulus, Martin P.
    Bodurka, Jerzy
    FRONTIERS IN PSYCHIATRY, 2021, 12