XG: A data-driven computation grid for enterprise-scale mining

被引:0
|
作者
Sion, R [1 ]
Natarajan, R
Narang, I
Li, WS
Phan, T
机构
[1] SUNY Stony Brook, Stony Brook, NY 11794 USA
[2] IBM Corp, TJ Watson Res Lab, Yorktown Hts, NY 10598 USA
[3] IBM Corp, Almaden Res Lab, San Jose, CA 95120 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we introduce a novel architecture for data processing, based on a functional fusion between a data and a computation layer. We show how such an architecture can be leveraged to offer significant speedups for data processing jobs such as data analysis and mining over large data sets. One novel contribution of our solution is its data-driven approach. The computation infrastructure is controlled from within the data layer. Grid compute job submission events are based within the query processor on the DBMS side and in effect controlled by the data processing job to be performed. This allows the early deployment of on-the-fly data aggregation techniques, minimizing the amount of data to be transfered to/from compute nodes and is in stark contrast to existing Grid solutions that interact with data layers mainly as external "storage". We validate this in a scenario derived from a real business deployment, involving financial customer profiling using common types of data analytics (e.g., linear regression analysis). Experimental results show significant speedups. For example, using a grid of only 12 non-dedicated nodes, we observed a speedup of approximately 1000% in a scenario involving complex linear regression analysis data mining computations for commercial customer profiling.
引用
收藏
页码:828 / 837
页数:10
相关论文
共 50 条
  • [1] A grid-based approach for enterprise-scale data mining
    Natarajan, Ramesh
    Sion, Radu
    Phan, Thomas
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2007, 23 (01): : 48 - 54
  • [2] Accelerating Data Discovery with an Ontology-driven Tool for an Enterprise-scale Data Lake Environment
    Raje, Satyajeet
    Kervin, Karina
    Issaie, Nergal
    Channapatna, Madhu
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 16100 - 16102
  • [3] iManage: Policy-driven self-management for enterprise-scale systems
    Kumar, Vibhore
    Cooper, Brian F.
    Eisenhauer, Greg
    Schwan, Karsten
    [J]. MIDDLEWARE 2007, PROCEEDINGS, 2007, 4834 : 287 - +
  • [4] Utility-driven proactive management of availability in enterprise-scale information flows
    Cai, Zhongtang
    Kumar, Vibhore
    Cooper, Brian F.
    Eisenhauer, Greg
    Schwan, Karsten
    Strom, Robert E.
    [J]. Middleware 2006, Proceedings, 2006, 4290 : 382 - 403
  • [5] Data-Driven Biology and Computation
    Hariharan, Ramesh
    [J]. CONTEMPORARY COMPUTING, 2012, 306 : 7 - 7
  • [6] Grid data transport: Planning for a data-driven grid
    Ogle, Jim
    [J]. IEEE POWER & ENERGY MAGAZINE, 2023, 21 (05): : 15 - 17
  • [7] Data-driven evolution of data mining algorithms
    Smyth, P
    Pregibon, D
    Faloutsos, C
    [J]. COMMUNICATIONS OF THE ACM, 2002, 45 (08) : 33 - 37
  • [8] The Bumpy Road to Becoming a Data-Driven Enterprise
    Kotlarsky, Julia
    Oshri, Ilan
    Sarker, Suprateek
    [J]. COMMUNICATIONS OF THE ASSOCIATION FOR INFORMATION SYSTEMS, 2024, 55 : 193 - 204
  • [9] Sentinel: A Multi-institution Enterprise Scale Platform for Data-driven Cybersecurity Research
    Nottingham, Alastair
    Buchanan, Molly
    Gardner, Mark
    Hiser, Jason
    Davidson, Jack
    [J]. 2022 IEEE INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING WORKSHOPS (ISSREW 2022), 2022, : 252 - 257
  • [10] Data-driven computation of molecular reaction coordinates
    Bittracher, Andreas
    Banisch, Ralf
    Schuette, Christof
    [J]. JOURNAL OF CHEMICAL PHYSICS, 2018, 149 (15):