Case Study of an On-premise Data Warehouse Configuration

被引:0
|
作者
Bogdandy, Bence [1 ]
Kovacs, Adam [2 ]
Toth, Zsolt [1 ]
机构
[1] Eszterhazy Karoly Univ, Inst Computat Sci, Eger, Hungary
[2] Eszterhazy Karoly Univ, Eger, Hungary
关键词
D O I
10.1109/coginfocom50765.2020.9237814
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The development of machine learning over the years has facilitated the joint upsurge of complex cognitive infocommunication systems. Machine Learning methods are vital elements of modern cognitive infocommunications systems because they can be used in various ways such as behavior modeling or sentiment analysis. Machine Learning algorithms requires a reliable infrastructure and vast amount of data. Therefore building data warehouse systems is one of the essential steps of of building reliable cognitive infocommunication systems. Finding and preprocessing data streams of different origins are the first steps during the creation of a data warehouse. Unfortunately, online data streams are most often formatted uniquely. Therefore, the obtained data sets must be transformed into a unified data model. The modelling and conversion of data sources serves as a key step during the unification of heterogeneous data. Storage should be persistent, and optimized for the analytical processing of data. These requirements raise technological challenges that are not common during the design of data sources. This paper gives an overview of current data warehouse technologies and suggests an infrastructure implementation. Hive is used for accessing, modifying, and running complex analytics on the stored data sets. Economical data can often be unique to the product, or the industry it covers. Different data sources used unique data formats which were tailored for their application area or needs. Moreover, some of these data sources may change their format in time. Therefore, a flexible data transformation step is required which can be configured easily. The ETL processes of the data sources are implemented in Python, and Hive. The data is loaded in a Hive data warehouse which stores data in the distributed Hadoop File System.
引用
收藏
页码:179 / 184
页数:6
相关论文
共 50 条
  • [21] Model guidelines for visibility of on-premise advertisement signs
    Kuhn, BT
    Garvey, PM
    Pietrucha, MT
    RESEARCH ON TRAFFIC CONTROL DEVICES, 1997, (1605): : 80 - 87
  • [22] Impact of sign orientation on on-premise commercial signs
    Zineddin, AZ
    Garvey, PM
    Pietrucha, MT
    JOURNAL OF TRANSPORTATION ENGINEERING-ASCE, 2005, 131 (01): : 11 - 17
  • [24] The handling of evidence in national and local policy making: a case study of alcohol industry actor strategies regarding data on on-premise trading hours and violence in Norway
    Ingeborg Rossow
    Jim McCambridge
    BMC Public Health, 19
  • [25] Migrating on-premise application workloads to a hybrid cloud architecture
    Mishra, Smita Prava
    Sahoo, Sukant Kumar
    Jena, Biswaranjan
    Tirthankar
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2022, 43 (05): : 1099 - 1108
  • [26] Critical assessment of on-premise approaches to scalable genome analysis
    Al-Aamri, Amira
    Azman, Syafiq Kamarul
    Elbait, Gihan Daw
    Alsafar, Habiba
    Henschel, Andreas
    BMC BIOINFORMATICS, 2023, 24 (01)
  • [27] Critical assessment of on-premise approaches to scalable genome analysis
    Amira Al-Aamri
    Syafiq Kamarul Azman
    Gihan Daw Elbait
    Habiba Alsafar
    Andreas Henschel
    BMC Bioinformatics, 24
  • [28] Optimizing the Transition: Strategies for Migrating On-Premise Storage to the Cloud
    Mejia-Garcia, Raquel
    Lezama-Leon, Evangelina
    Guadarrama-Atrizco, Victor Hugo
    Solis-Galindo, Alonso Ernesto
    INTERNATIONAL JOURNAL OF COMBINATORIAL OPTIMIZATION PROBLEMS AND INFORMATICS, 2024, 15 (03): : 155 - 163
  • [29] Deploying and extending on-premise cloud storage based on ownCloud
    Hildmann, Thomas
    Kao, Odej
    2014 IEEE 34TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS (ICDCSW), 2014, : 76 - 81
  • [30] Data Warehouse Discovery Framework: The Case Study
    Apanowicz, Cas
    DATABASE THEORY AND APPLICATION, BIO-SCIENCE AND BIO-TECHNOLOGY, 2010, 118 : 155 - 166