Case Study of an On-premise Data Warehouse Configuration

被引：0

作者：

Bogdandy, Bence ^{[1
]}

Kovacs, Adam ^{[2
]}

Toth, Zsolt ^{[1
]}

机构：

[1] Eszterhazy Karoly Univ, Inst Computat Sci, Eger, Hungary

[2] Eszterhazy Karoly Univ, Eger, Hungary

来源：

2020 11TH IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM 2020) | 2020年

关键词：

D O I：

10.1109/coginfocom50765.2020.9237814

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The development of machine learning over the years has facilitated the joint upsurge of complex cognitive infocommunication systems. Machine Learning methods are vital elements of modern cognitive infocommunications systems because they can be used in various ways such as behavior modeling or sentiment analysis. Machine Learning algorithms requires a reliable infrastructure and vast amount of data. Therefore building data warehouse systems is one of the essential steps of of building reliable cognitive infocommunication systems. Finding and preprocessing data streams of different origins are the first steps during the creation of a data warehouse. Unfortunately, online data streams are most often formatted uniquely. Therefore, the obtained data sets must be transformed into a unified data model. The modelling and conversion of data sources serves as a key step during the unification of heterogeneous data. Storage should be persistent, and optimized for the analytical processing of data. These requirements raise technological challenges that are not common during the design of data sources. This paper gives an overview of current data warehouse technologies and suggests an infrastructure implementation. Hive is used for accessing, modifying, and running complex analytics on the stored data sets. Economical data can often be unique to the product, or the industry it covers. Different data sources used unique data formats which were tailored for their application area or needs. Moreover, some of these data sources may change their format in time. Therefore, a flexible data transformation step is required which can be configured easily. The ETL processes of the data sources are implemented in Python, and Hive. The data is loaded in a Hive data warehouse which stores data in the distributed Hadoop File System.

引用

页码：179 / 184

页数：6

共 50 条

[1] BRIDGING On-Premise and Cloud Data
Buntain, Megan
InTech, 2022, 69 (03) : 16 - 18
[2] Towards a Modular On-Premise Approach for Data Sharing
Resende, Joao S.
Magalhaes, Luis
Brandao, Andre
Martins, Rolando
Antunes, Luis
SENSORS, 2021, 21 (17)
[3] Integrator: An Architecture for an Integrated Cloud/On-Premise Data-Service
Leff, Avraham
Rayfield, James T.
2015 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES (ICWS), 2015, : 98 - 104
[4] ERP On-Premise or On-Demand
Zhao, Fan
Kirche, Elias T.
INTERNATIONAL JOURNAL OF BUSINESS ANALYTICS, 2018, 5 (02) : 1 - 16
[5] Data Driven Development: Challenges in Online, Embedded and On-Premise Software
Olsson, Helena Holmstrom
Bosch, Jan
PRODUCT-FOCUSED SOFTWARE PROCESS IMPROVEMENT, PROFES 2019, 2019, 11915 : 515 - 527
[6] Data warehouse configuration
Theodoratos, D
Sellis, T
PROCEEDINGS OF THE TWENTY-THIRD INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES, 1997, : 126 - 135
[7] An on-premise study to investigate the effects of mixing alcohol with caffeinated beverages
Johnson, Sean J.
Verster, Joris C.
Alford, Chris
BRAIN AND BEHAVIOR, 2022, 12 (03):
[8] A Model For Predicting Resources For On-Premise Applications
Rajaram, Kanchana
Malarvizhi, M. P.
2017 INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION AND SIGNAL PROCESSING (ICCCSP), 2017, : 65 - 70
[9] Digital transformation with a lightweight on-premise PaaS
Music, Din
Hribar, Jernej
Fortuna, Carolina
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 160 : 619 - 629
[10] Kubitect - a Solution for On-premise Cluster Deployment
Music, Din
Fortuna, Carolina
2022 IEEE/ACM 15TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING, UCC, 2022, : 273 - 278

← 1 2 3 4 5 →