Big Data Pipeline with ML-based and Crowd Sourced Dynamically Created and Maintained Columnar Data Warehouse for Structured and Unstructured Big Data

被引:2
|
作者
Ghane, Kamran [1 ]
机构
[1] Anagira Inc, Los Angeles, CA 90013 USA
关键词
D O I
10.1109/ICICT50521.2020.00018
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The existing big data platforms take data through distributed processing platforms and store them in a data lake. The architectures such as Lambda and Kappa address the real-time and batch processing of data. Such systems provide real time analytics on the raw data and delayed analytics on the curated data. The data denormalization, creation and maintenance of a columnar dimensional data warehouse is usually time consuming with no or limited support for unstructured data. The system introduced in this paper automatically creates and dynamically maintains its data warehouse as a part of its big data pipeline in addition to its data lake. It creates its data warehouse on structured, semi-structured and unstructured data. It uses Machine Learning to identify and create dimensions. It also establishes relations among data from different data sources and creates the corresponding dimensions. It dynamically optimizes the dimensions based on the crowd sourced data provided by end users and also based on query analysis.
引用
收藏
页码:60 / 67
页数:8
相关论文
共 50 条
  • [1] Structured and Unstructured Big Data Analytics
    Misluu, Suyash
    Misra, Anuranjan
    [J]. 2017 INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN COMPUTER, ELECTRICAL, ELECTRONICS AND COMMUNICATION (CTCEEC), 2017, : 740 - 746
  • [2] Associated Index for Big Structured and Unstructured Data
    Zhu, Chunying
    Li, Qingzhong
    Kong, Lanju
    Wang, Xiangwei
    Hong, Xiaoguang
    [J]. WEB-AGE INFORMATION MANAGEMENT (WAIM 2015), 2015, 9098 : 567 - 570
  • [3] Big Data Warehouse: Building Columnar NoSQL OLAP Cubes
    Dehdouh, Khaled
    Boussaid, Omar
    Bentayeb, Fadila
    [J]. INTERNATIONAL JOURNAL OF DECISION SUPPORT SYSTEM TECHNOLOGY, 2020, 12 (01) : 1 - 24
  • [4] A Model for Enhancing Unstructured Big Data Warehouse Execution Time
    Farhan, Marwa Salah
    Youssef, Amira
    Abdelhamid, Laila
    [J]. BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (02)
  • [5] Political Science and Big Data: Structured Data, Unstructured Data, and How to Use Them
    Grossman, Jonathan
    Pedahzur, Ami
    [J]. POLITICAL SCIENCE QUARTERLY, 2020, 135 (02) : 225 - 257
  • [6] Big Data ML-Based Fake News Detection Using Distributed Learning
    Altheneyan, Alaa
    Alhadlaq, Aseel
    [J]. IEEE ACCESS, 2023, 11 : 29447 - 29463
  • [7] Atrak: a MapReduce-based data warehouse for big data
    Barkhordari, Mohammadhossein
    Niamanesh, Mahdi
    [J]. JOURNAL OF SUPERCOMPUTING, 2017, 73 (10): : 4596 - 4610
  • [8] Atrak: a MapReduce-based data warehouse for big data
    Mohammadhossein Barkhordari
    Mahdi Niamanesh
    [J]. The Journal of Supercomputing, 2017, 73 : 4596 - 4610
  • [9] A Structured Analysis of Unstructured Big Data by Leveraging Cloud Computing
    Liu, Xiao
    Singh, Param Vir
    Srinivasan, Kannan
    [J]. MARKETING SCIENCE, 2016, 35 (03) : 363 - 388
  • [10] Collaborative Merging of Radio SLAM Maps in View of Crowd-sourced Data Acquisition and Big Data
    Batstone, Kenneth
    Oskarsson, Magnus
    Astrom, Kalle
    [J]. ICPRAM: PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS, 2019, : 807 - 813