A Model for Enhancing Unstructured Big Data Warehouse Execution Time

被引:0
|
作者
Farhan, Marwa Salah [1 ,2 ]
Youssef, Amira [1 ,3 ]
Abdelhamid, Laila [1 ]
机构
[1] Helwan Univ, Fac Comp & Artificial Intelligence, Dept Informat Syst, Cairo 11795, Egypt
[2] British Univ Egypt, Fac Informat & Comp Sci, Cairo 11837, Egypt
[3] Higher Inst Comp Sci & Informat Syst, Dept Comp Sci, Settlement 5, Cairo 11835, Egypt
关键词
big data; unstructured data warehouse; ELT; ETL;
D O I
10.3390/bdcc8020017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional data warehouses (DWs) have played a key role in business intelligence and decision support systems. However, the rapid growth of the data generated by the current applications requires new data warehousing systems. In big data, it is important to adapt the existing warehouse systems to overcome new issues and limitations. The main drawbacks of traditional Extract-Transform-Load (ETL) are that a huge amount of data cannot be processed over ETL and that the execution time is very high when the data are unstructured. This paper focuses on a new model consisting of four layers: Extract-Clean-Load-Transform (ECLT), designed for processing unstructured big data, with specific emphasis on text. The model aims to reduce execution time through experimental procedures. ECLT is applied and tested using Spark, which is a framework employed in Python. Finally, this paper compares the execution time of ECLT with different models by applying two datasets. Experimental results showed that for a data size of 1 TB, the execution time of ECLT is 41.8 s. When the data size increases to 1 million articles, the execution time is 119.6 s. These findings demonstrate that ECLT outperforms ETL, ELT, DELT, ELTL, and ELTA in terms of execution time.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] Data Warehouse MFRJ Query Execution Model for MapReduce
    Burdakov, Aleksey
    Grigorev, Uriy
    Proletarskaya, Victoria
    Ustimov, Artem
    [J]. IOTBDS: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON INTERNET OF THINGS, BIG DATA AND SECURITY, 2017, : 206 - 215
  • [2] Big Data Quality Assessment Model for Unstructured Data
    Taleb, Ikbal
    Serhani, Mohamed Adel
    Dssouli, Rachida
    [J]. PROCEEDINGS OF THE 2018 13TH INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION TECHNOLOGY (IIT), 2018, : 69 - 74
  • [3] Hybrid Data Warehouse Model for Climate Big Data Analysis
    Doreswamy
    Gad, Ibrahim
    Manjunatha, B. R.
    [J]. PROCEEDINGS OF 2017 IEEE INTERNATIONAL CONFERENCE ON CIRCUIT ,POWER AND COMPUTING TECHNOLOGIES (ICCPCT), 2017,
  • [4] Big Data Pipeline with ML-based and Crowd Sourced Dynamically Created and Maintained Columnar Data Warehouse for Structured and Unstructured Big Data
    Ghane, Kamran
    [J]. 2020 3RD INTERNATIONAL CONFERENCE ON INFORMATION AND COMPUTER TECHNOLOGIES (ICICT 2020), 2020, : 60 - 67
  • [5] Usability enhancement model for unstructured text in big data
    Kiran Adnan
    Rehan Akbar
    Khor Siak Wang
    [J]. Journal of Big Data, 10
  • [6] Usability enhancement model for unstructured text in big data
    Adnan, Kiran
    Akbar, Rehan
    Wang, Khor Siak
    [J]. JOURNAL OF BIG DATA, 2023, 10 (01)
  • [7] A NoSQL Data Model For Scalable Big Data Workflow Execution
    Mohan, Aravind
    Ebrahimi, Mahdi
    Lu, Shiyong
    Kotov, Alexander
    [J]. 2016 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2016, 2016, : 52 - 59
  • [8] Intelligence use of unstructured data in a data warehouse environment
    Wakefield, Jim
    [J]. INTELLIGENCE AND SECURITY INFORMATICS, PROCEEDINGS, 2006, 3975 : 694 - 695
  • [9] On the Research of Data Warehouse in Big Data
    Qin, Hai-fei
    Qian, Zhi-ming
    Zhao, Yong-chao
    [J]. 2015 INTERNATIONAL CONFERENCE ON NETWORK AND INFORMATION SYSTEMS FOR COMPUTERS (ICNISC), 2015, : 354 - 357
  • [10] Data warehouse design for manufacturing execution systems
    Chen, KY
    Wu, TC
    [J]. 2005 IEEE International Conference on Mechatronics, 2005, : 751 - 756