Assisting Developers of Big Data Analytics Applications When Deploying on Hadoop Clouds

被引:0
|
作者
Shang, Weiyi [1 ]
Jiang, Zhen Ming [1 ]
Hemmati, Hadi [1 ]
Adams, Bram [2 ]
Hassan, Ahmed E. [1 ]
Martin, Patrick [3 ]
机构
[1] Queens Univ, Sch Comp, SAIL, Kingston, ON, Canada
[2] Polytech Montreal, Dept Gen Informat Gen Logic, Montreal, PQ, Canada
[3] Queens Univ, Sch Comp, Database Syst Lab, Kingston, ON, Canada
关键词
Big-Data Analytics Application; Cloud Computing; Monitoring and Debugging; Log Analysis; Hadoop; SYSTEM;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Big data analytics is the process of examining large amounts of data (big data) in an effort to uncover hidden patterns or unknown correlations. Big Data Analytics Applications (BDA Apps) are a new type of software applications, which analyze big data using massive parallel processing frameworks (e. g., Hadoop). Developers of such applications typically develop them using a small sample of data in a pseudo-cloud environment. Afterwards, they deploy the applications in a large-scale cloud environment with considerably more processing power and larger input data (reminiscent of the mainframe days). Working with BDA App developers in industry over the past three years, we noticed that the runtime analysis and debugging of such applications in the deployment phase cannot be easily addressed by traditional monitoring and debugging approaches. In this paper, as a first step in assisting developers of BDA Apps for cloud deployments, we propose a lightweight approach for uncovering differences between pseudo and large-scale cloud deployments. Our approach makes use of the readily-available yet rarely used execution logs from these platforms. Our approach abstracts the execution logs, recovers the execution sequences, and compares the sequences between the pseudo and cloud deployments. Through a case study on three representative Hadoop-based BDA Apps, we show that our approach can rapidly direct the attention of BDA App developers to the major differences between the two deployments. Knowledge of such differences is essential in verifying BDA Apps when analyzing big data in the cloud. Using injected deployment faults, we show that our approach not only significantly reduces the deployment verification effort, but also provides very few false positives when identifying deployment failures.
引用
收藏
页码:402 / 411
页数:10
相关论文
共 50 条
  • [1] Moving Hadoop to the Cloud for Big Data Analytics
    Astrova, Irina
    Koschel, Arne
    Heine, Felix
    Kalja, Ahto
    [J]. DATABASES AND INFORMATION SYSTEMS X (DB&IS 2018), 2019, 315 : 195 - 209
  • [2] Clouds for Scalable Big Data Analytics
    Talia, Domenico
    [J]. COMPUTER, 2013, 46 (05) : 98 - 101
  • [3] The Emerging Hadoop, Analytics, Stream Stack for Big Data
    Bernstein, David
    [J]. IEEE CLOUD COMPUTING, 2014, 1 (04): : 84 - 86
  • [4] Shared Disk Big Data Analytics with Apache Hadoop
    Mukherjee, Anirban
    Datta, Joydip
    Jorapur, Raghavendra
    Singhvi, Ravi
    Haloi, Saurav
    Akram, Wasim
    [J]. 2012 19TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2012,
  • [5] Big data analytics with applications
    Bi, Zhuming
    Cochran, David
    [J]. JOURNAL OF MANAGEMENT ANALYTICS, 2014, 1 (04) : 249 - 265
  • [6] BigDataDIRAC: deploying distributed Big Data applications
    Fernandez, Victor
    Mendez, Victor
    Pena, Tomas F.
    [J]. 2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 1177 - 1180
  • [7] Optimizing Hadoop Performance for Big Data Analytics in Smart Grid
    Khan, Mukhtaj
    Huang, Zhengwen
    Li, Maozhen
    Taylor, Gareth A.
    Ashton, Phillip M.
    Khan, Mushtaq
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2017, 2017
  • [8] Big Data Analytics- Recommendation System with Hadoop Framework
    Kadam, Sayali D.
    Motwani, Dilip
    Vaidya, Siddhesh A.
    [J]. 2016 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT), VOL 3, 2015, : 906 - 910
  • [9] Taxonomy on the integration of Hadoop and Rapid Miner for Big Data Analytics
    Utmal, Meghna
    Pandey, R. K.
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 890 - 893
  • [10] An overview of Hadoop applications in transportation big data
    Ma, Changxi
    Zhao, Mingxi
    Zhao, Yongpeng
    [J]. JOURNAL OF TRAFFIC AND TRANSPORTATION ENGINEERING-ENGLISH EDITION, 2023, 10 (05) : 900 - 917