An Infrastructure for Automating Large-scale Performance Studies and Data Processing

被引:0
|
作者
Jayasinghe, Deepal [1 ]
Kimball, Josh [1 ]
Zhu, Tao [1 ]
Choudhary, Siddharth [1 ]
Pu, Calton [1 ]
机构
[1] Georgia Inst Technol, Ctr Expt Res Comp Syst, Atlanta, GA 30332 USA
关键词
Automation; Benchmarking; Cloud; Code Generation; Data Warehouse; ETL; Performance; Visualization;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Cloud has enabled the computing model to shift from traditional data centers to publicly shared computing infrastructure; yet, applications leveraging this new computing model can experience performance and scalability issues, which arise from the hidden complexities of the cloud. The most reliable path for better understanding these complexities is an empirically based approach that relies on collecting data from a large number of performance studies. Armed with this performance data, we can understand what has happened, why it happened, and more importantly, predict what will happen in the future. However, this approach presents challenges itself, namely in the form of data management. We attempt to mitigate these data challenges by fully automating the performance measurement process. Concretely, we have developed an automated infrastructure, which reduces the complexity of the large-scale performance measurement process by generating all the necessary resources to conduct experiments, to collect and process data and to store and analyze data. In this paper, we focus on the performance data management aspect of our infrastructure.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] An Efficient Strategy for Large-Scale CORS Data Processing
    Xiong, Bolin
    Huang, Dingfa
    CHINA SATELLITE NAVIGATION CONFERENCE (CSNC) 2016 PROCEEDINGS, VOL I, 2016, 388 : 213 - 225
  • [22] Ten simple rules for large-scale data processing
    Fungtammasan, Arkarachai
    Lee, Alexandra
    Taroni, Jaclyn
    Wheeler, Kurt
    Chin, Chen-Shan
    Davis, Sean
    Greene, Casey
    PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (02)
  • [23] THE DESIGN OF DATA PROCESSING COMPILERS FOR LARGE-SCALE COMPUTERS
    NUTT, R
    SWIFT, CJ
    COMMUNICATIONS OF THE ACM, 1963, 6 (07) : 360 - 360
  • [24] Review of large-scale RDF data processing in mapreduce
    Hou, Ke
    Zhang, Ming
    Fang, Xing
    Journal of Software Engineering, 2015, 9 (01): : 195 - 202
  • [25] DATA-PROCESSING IN LARGE-SCALE RESEARCH PROJECTS
    FLANAGAN, JC
    HARVARD EDUCATIONAL REVIEW, 1961, 31 (03) : 250 - 256
  • [26] The Family of MapReduce and Large-Scale Data Processing Systems
    Sakr, Sherif
    Liu, Anna
    Fayoumi, Ayman G.
    ACM COMPUTING SURVEYS, 2013, 46 (01)
  • [27] Optimizing data stream processing for large-scale applications
    Cappellari, Paolo
    Roantree, Mark
    Chun, Soon Ae
    SOFTWARE-PRACTICE & EXPERIENCE, 2018, 48 (09): : 1607 - 1641
  • [28] Hancock: A language for processing very large-scale data
    Bonachea, D
    Fisher, K
    Rogers, A
    Smith, F
    USENIX ASSOCIATION PROCEEDINGS OF THE 2ND CONFERENCE ON DOMAIN-SPECIFIC LANGUAGES (DSL'99), 1999, : 163 - 176
  • [29] Large-scale human metabolomics studies: A strategy for data (pre-) processing and validation
    Bijlsma, S
    Bobeldijk, L
    Verheij, ER
    Ramaker, R
    Kochhar, S
    Macdonald, IA
    van Ommen, B
    Smilde, AK
    ANALYTICAL CHEMISTRY, 2006, 78 (02) : 567 - 574
  • [30] Large-scale data processing software and performance instabilities within HEP grid environments
    Datskova, Olga
    Shi, Wedong
    INTERNATIONAL JOURNAL OF GRID AND UTILITY COMPUTING, 2019, 10 (04) : 402 - 414