Case Study of an On-premise Data Warehouse Configuration

被引:0
|
作者
Bogdandy, Bence [1 ]
Kovacs, Adam [2 ]
Toth, Zsolt [1 ]
机构
[1] Eszterhazy Karoly Univ, Inst Computat Sci, Eger, Hungary
[2] Eszterhazy Karoly Univ, Eger, Hungary
关键词
D O I
10.1109/coginfocom50765.2020.9237814
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The development of machine learning over the years has facilitated the joint upsurge of complex cognitive infocommunication systems. Machine Learning methods are vital elements of modern cognitive infocommunications systems because they can be used in various ways such as behavior modeling or sentiment analysis. Machine Learning algorithms requires a reliable infrastructure and vast amount of data. Therefore building data warehouse systems is one of the essential steps of of building reliable cognitive infocommunication systems. Finding and preprocessing data streams of different origins are the first steps during the creation of a data warehouse. Unfortunately, online data streams are most often formatted uniquely. Therefore, the obtained data sets must be transformed into a unified data model. The modelling and conversion of data sources serves as a key step during the unification of heterogeneous data. Storage should be persistent, and optimized for the analytical processing of data. These requirements raise technological challenges that are not common during the design of data sources. This paper gives an overview of current data warehouse technologies and suggests an infrastructure implementation. Hive is used for accessing, modifying, and running complex analytics on the stored data sets. Economical data can often be unique to the product, or the industry it covers. Different data sources used unique data formats which were tailored for their application area or needs. Moreover, some of these data sources may change their format in time. Therefore, a flexible data transformation step is required which can be configured easily. The ETL processes of the data sources are implemented in Python, and Hive. The data is loaded in a Hive data warehouse which stores data in the distributed Hadoop File System.
引用
收藏
页码:179 / 184
页数:6
相关论文
共 50 条
  • [31] Case study of ROLAP Enterprise Data Warehouse
    GoPaul, K
    Chung, PT
    Ghriga, M
    IKE '04: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE ENGNINEERING, 2004, : 344 - 352
  • [32] Warehouse configuration in omni-channel retailing: a multiple case study
    Kembro, Joakim Hans
    Norrman, Andreas
    INTERNATIONAL JOURNAL OF PHYSICAL DISTRIBUTION & LOGISTICS MANAGEMENT, 2020, 50 (05) : 509 - 533
  • [33] TYPES OF ON-PREMISE ALCOHOL OUTLETS AND ALCOHOL-RELATED BEHAVIORS
    Delmerico, A. M.
    Wieczorek, W. F.
    Marczynski, K. S.
    ALCOHOLISM-CLINICAL AND EXPERIMENTAL RESEARCH, 2012, 36 : 250A - 250A
  • [34] Experimental evaluation of data warehouse configuration algorithms
    Ligoudistianos, S
    Theodoratos, D
    Sellis, T
    NINTH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 1998, : 218 - 223
  • [35] An Alternative Data Warehouse Reference Architectural Configuration
    Gonzalez-Castro, Victor
    MacKinnon, Lachlan M.
    del Pilar Angeles, Maria
    DATASPACE: THE FINAL FRONTIER, PROCEEDINGS, 2009, 5588 : 33 - +
  • [36] A Framework for Improving Data Quality in Data Warehouse: A Case Study
    Ali, Taghrid Z.
    Abdelaziz, Tawfig M.
    Maatuk, Abdelsalam M.
    Elakeili, Salwa M.
    2020 21ST INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2020,
  • [37] On-Premise AIOps Infrastructure for a Software Editor SME: An Experience Report
    Bendimerad, Anes
    Remil, Youcef
    Mathonat, Romain
    Kaytoue, Mehdi
    PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, : 1820 - 1831
  • [38] Obstacles of On-Premise Enterprise Resource Planning Systems and Solution Directions
    Sancar Gozukara, Senem
    Tekinerdogan, Bedir
    Catal, Cagatay
    JOURNAL OF COMPUTER INFORMATION SYSTEMS, 2022, 62 (01) : 141 - 152
  • [39] Business Application Acquisition: On-Premise or SaaS-Based Solutions?
    Bibi, Stamatia
    Katsaros, Dimitrios
    Bozanis, Panayiotis
    IEEE SOFTWARE, 2012, 29 (03) : 86 - 93
  • [40] Cloud vs On-Premise HPC: A Model for Comprehensive Cost Assessment
    Ferretti, Marco
    Santangelo, Luigi
    PARALLEL COMPUTING: TECHNOLOGY TRENDS, 2020, 36 : 69 - 80