Big Data Pipeline with ML-based and Crowd Sourced Dynamically Created and Maintained Columnar Data Warehouse for Structured and Unstructured Big Data

被引:2
|
作者
Ghane, Kamran [1 ]
机构
[1] Anagira Inc, Los Angeles, CA 90013 USA
关键词
D O I
10.1109/ICICT50521.2020.00018
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The existing big data platforms take data through distributed processing platforms and store them in a data lake. The architectures such as Lambda and Kappa address the real-time and batch processing of data. Such systems provide real time analytics on the raw data and delayed analytics on the curated data. The data denormalization, creation and maintenance of a columnar dimensional data warehouse is usually time consuming with no or limited support for unstructured data. The system introduced in this paper automatically creates and dynamically maintains its data warehouse as a part of its big data pipeline in addition to its data lake. It creates its data warehouse on structured, semi-structured and unstructured data. It uses Machine Learning to identify and create dimensions. It also establishes relations among data from different data sources and creates the corresponding dimensions. It dynamically optimizes the dimensions based on the crowd sourced data provided by end users and also based on query analysis.
引用
收藏
页码:60 / 67
页数:8
相关论文
共 50 条
  • [21] Performance Analysis of Machine Learning Algorithms for Big Data Classification: ML and Al-Based Algorithms for Big Data Analysis
    Punia, Sanjeev Kumar
    Kumar, Manoj
    Stephan, Thompson
    Deverajan, Ganesh Gopal
    Patan, Rizwan
    [J]. INTERNATIONAL JOURNAL OF E-HEALTH AND MEDICAL COMMUNICATIONS, 2021, 12 (04) : 60 - 75
  • [22] Estimate Air Quality Based on Mobile Crowd Sensing and Big Data
    Feng, Cheng
    Wang, Wendong
    Tian, Ye
    Que, Xirong
    Gong, Xiangyang
    [J]. 2017 IEEE 18TH INTERNATIONAL SYMPOSIUM ON A WORLD OF WIRELESS, MOBILE AND MULTIMEDIA NETWORKS (WOWMOM), 2017,
  • [23] Application of Big Data in College Student Education Management Based on Data Warehouse Technology and Integrated Learning
    Zhou, Junping
    Li, Xueyuan
    [J]. INTERNATIONAL JOURNAL OF E-COLLABORATION, 2024, 20 (01)
  • [24] Intelligent Urban Transport Decision Analysis System Based on Mining in Big Data Analytics and Data Warehouse
    Addakiri, Khaoula
    Khallouki, Hajar
    Bahaj, Mohamed
    [J]. ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT, AI2SD'2019, VOL 6: ADVANCED INTELLIGENT SYSTEMS FOR NETWORKS AND SYSTEMS, 2020, 92 : 179 - 184
  • [25] Research on Real-time Processing and Stream Analysis of Unstructured Data Based on Big Data Platforms
    Liang, Huichao
    Wang, Di
    Liu, Yuan
    Mei, Lin
    Zhou, Mengxue
    Zhao, Haibin
    [J]. PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON MACHINE INTELLIGENCE AND DIGITAL APPLICATIONS, MIDA2024, 2024, : 96 - 101
  • [26] Ontology-Based Big Dimension Modeling in Data Warehouse Schema Design
    Liu, Xiufeng
    Iftikhar, Nadeem
    [J]. BUSINESS INFORMATION SYSTEMS, BIS 2013, 2013, 157 : 75 - 87
  • [27] Pipeline-Based Linear Scheduling of Big Data Streams in the Cloud
    Tantalaki, Nicoleta
    Souravlas, Stavros
    Roumeliotis, Manos
    Katsavounis, Stefanos
    [J]. IEEE ACCESS, 2020, 8 : 117182 - 117202
  • [28] Crowd Sensing of Urban Emergency Events based on Social Meida Big Data
    Xu, Zheng
    Zhang, Hui
    Liu, Yunhuai
    Mei, Lin
    [J]. 2014 IEEE 13TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM), 2014, : 605 - 610
  • [29] A MapReduce-based scalable discovery and indexing of structured big data
    Singh, Hari
    Bawa, Seema
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2017, 73 : 32 - 43
  • [30] Analysis method for structured big data feature based on hypernetwork model
    Xu, Shu
    [J]. INTERNATIONAL JOURNAL OF INTERNET PROTOCOL TECHNOLOGY, 2021, 14 (03) : 162 - 168