Case Study of an On-premise Data Warehouse Configuration

被引:0
|
作者
Bogdandy, Bence [1 ]
Kovacs, Adam [2 ]
Toth, Zsolt [1 ]
机构
[1] Eszterhazy Karoly Univ, Inst Computat Sci, Eger, Hungary
[2] Eszterhazy Karoly Univ, Eger, Hungary
关键词
D O I
10.1109/coginfocom50765.2020.9237814
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The development of machine learning over the years has facilitated the joint upsurge of complex cognitive infocommunication systems. Machine Learning methods are vital elements of modern cognitive infocommunications systems because they can be used in various ways such as behavior modeling or sentiment analysis. Machine Learning algorithms requires a reliable infrastructure and vast amount of data. Therefore building data warehouse systems is one of the essential steps of of building reliable cognitive infocommunication systems. Finding and preprocessing data streams of different origins are the first steps during the creation of a data warehouse. Unfortunately, online data streams are most often formatted uniquely. Therefore, the obtained data sets must be transformed into a unified data model. The modelling and conversion of data sources serves as a key step during the unification of heterogeneous data. Storage should be persistent, and optimized for the analytical processing of data. These requirements raise technological challenges that are not common during the design of data sources. This paper gives an overview of current data warehouse technologies and suggests an infrastructure implementation. Hive is used for accessing, modifying, and running complex analytics on the stored data sets. Economical data can often be unique to the product, or the industry it covers. Different data sources used unique data formats which were tailored for their application area or needs. Moreover, some of these data sources may change their format in time. Therefore, a flexible data transformation step is required which can be configured easily. The ETL processes of the data sources are implemented in Python, and Hive. The data is loaded in a Hive data warehouse which stores data in the distributed Hadoop File System.
引用
收藏
页码:179 / 184
页数:6
相关论文
共 50 条
  • [1] BRIDGING On-Premise and Cloud Data
    Buntain, Megan
    InTech, 2022, 69 (03) : 16 - 18
  • [2] Towards a Modular On-Premise Approach for Data Sharing
    Resende, Joao S.
    Magalhaes, Luis
    Brandao, Andre
    Martins, Rolando
    Antunes, Luis
    SENSORS, 2021, 21 (17)
  • [3] Integrator: An Architecture for an Integrated Cloud/On-Premise Data-Service
    Leff, Avraham
    Rayfield, James T.
    2015 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES (ICWS), 2015, : 98 - 104
  • [4] ERP On-Premise or On-Demand
    Zhao, Fan
    Kirche, Elias T.
    INTERNATIONAL JOURNAL OF BUSINESS ANALYTICS, 2018, 5 (02) : 1 - 16
  • [5] Data Driven Development: Challenges in Online, Embedded and On-Premise Software
    Olsson, Helena Holmstrom
    Bosch, Jan
    PRODUCT-FOCUSED SOFTWARE PROCESS IMPROVEMENT, PROFES 2019, 2019, 11915 : 515 - 527
  • [6] Data warehouse configuration
    Theodoratos, D
    Sellis, T
    PROCEEDINGS OF THE TWENTY-THIRD INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES, 1997, : 126 - 135
  • [7] An on-premise study to investigate the effects of mixing alcohol with caffeinated beverages
    Johnson, Sean J.
    Verster, Joris C.
    Alford, Chris
    BRAIN AND BEHAVIOR, 2022, 12 (03):
  • [8] A Model For Predicting Resources For On-Premise Applications
    Rajaram, Kanchana
    Malarvizhi, M. P.
    2017 INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION AND SIGNAL PROCESSING (ICCCSP), 2017, : 65 - 70
  • [9] Digital transformation with a lightweight on-premise PaaS
    Music, Din
    Hribar, Jernej
    Fortuna, Carolina
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 160 : 619 - 629
  • [10] Kubitect - a Solution for On-premise Cluster Deployment
    Music, Din
    Fortuna, Carolina
    2022 IEEE/ACM 15TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING, UCC, 2022, : 273 - 278