A Model for Enhancing Unstructured Big Data Warehouse Execution Time

被引:0
|
作者
Farhan, Marwa Salah [1 ,2 ]
Youssef, Amira [1 ,3 ]
Abdelhamid, Laila [1 ]
机构
[1] Helwan Univ, Fac Comp & Artificial Intelligence, Dept Informat Syst, Cairo 11795, Egypt
[2] British Univ Egypt, Fac Informat & Comp Sci, Cairo 11837, Egypt
[3] Higher Inst Comp Sci & Informat Syst, Dept Comp Sci, Settlement 5, Cairo 11835, Egypt
关键词
big data; unstructured data warehouse; ELT; ETL;
D O I
10.3390/bdcc8020017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional data warehouses (DWs) have played a key role in business intelligence and decision support systems. However, the rapid growth of the data generated by the current applications requires new data warehousing systems. In big data, it is important to adapt the existing warehouse systems to overcome new issues and limitations. The main drawbacks of traditional Extract-Transform-Load (ETL) are that a huge amount of data cannot be processed over ETL and that the execution time is very high when the data are unstructured. This paper focuses on a new model consisting of four layers: Extract-Clean-Load-Transform (ECLT), designed for processing unstructured big data, with specific emphasis on text. The model aims to reduce execution time through experimental procedures. ECLT is applied and tested using Spark, which is a framework employed in Python. Finally, this paper compares the execution time of ECLT with different models by applying two datasets. Experimental results showed that for a data size of 1 TB, the execution time of ECLT is 41.8 s. When the data size increases to 1 million articles, the execution time is 119.6 s. These findings demonstrate that ECLT outperforms ETL, ELT, DELT, ELTL, and ELTA in terms of execution time.
引用
收藏
页数:26
相关论文
共 50 条
  • [21] Enhancing an Enterprise Data Warehouse with a data dictionary
    Lau, LM
    Lam, SH
    Barlow, S
    Lyon, C
    Sanders, D
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2001, : 951 - 951
  • [22] Unstructured Data Treatment for Big Data Solutions
    Sato, Shintaro
    Kayahara, Akihiro
    Imai, Shin-ichi
    [J]. INTERNATIONAL SYMPOSIUM ON SEMICONDUCTOR MANUFACTURING (ISSM) 2016 PROCEEDINGS OF TECHNICAL PAPERS, 2016,
  • [23] Development of Usability Enhancement Model for Unstructured Big Data Using SLR
    Adnan, Kiran
    Akbar, Rehan
    Wang, Khor Siak
    [J]. IEEE ACCESS, 2021, 9 : 87391 - 87409
  • [24] Big data execution time based on Spark Machine Learning Libraries
    Garate-Escamilla, Anna Karen
    Hajjam El Hassani, Amir
    Andres, Emmanuel
    [J]. PROCEEDINGS OF 2019 3RD INTERNATIONAL CONFERENCE ON CLOUD AND BIG DATA COMPUTING (ICCBDC 2019), 2019, : 78 - 83
  • [25] An Approach to Security for Unstructured Big Data
    Md. Ezazul Islam
    Md. Rafiqul Islam
    A B M Shawkat Ali
    [J]. The Review of Socionetwork Strategies, 2016, 10 (2) : 105 - 123
  • [26] Structured and Unstructured Big Data Analytics
    Misluu, Suyash
    Misra, Anuranjan
    [J]. 2017 INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN COMPUTER, ELECTRICAL, ELECTRONICS AND COMMUNICATION (CTCEEC), 2017, : 740 - 746
  • [27] ExNav: An Interactive Big Data hxploration Framework for Big Unstructured Data
    Ge, Xiaoyu
    Zhang, Xiaozhong
    Chrysanthis, Panos K.
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 503 - 512
  • [28] An Approach to Security for Unstructured Big Data
    Islam, Md. Ezazul
    Islam, Md. Rafiqul
    Ali, A. B. M. Shawkat
    [J]. REVIEW OF SOCIONETWORK STRATEGIES, 2016, 10 (02): : 105 - 123
  • [29] Unstructured Data Service Model Utilizing Context-Aware Big Data Analysis
    Kim, Yonghoon
    Chung, Mokdong
    [J]. ADVANCES IN COMPUTER SCIENCE AND UBIQUITOUS COMPUTING, 2017, 421 : 926 - 931
  • [30] Data Warehouse with Big Data Technology for Higher Education
    Santoso, Leo Willyanto
    Yulia
    [J]. 4TH INFORMATION SYSTEMS INTERNATIONAL CONFERENCE (ISICO 2017), 2017, 124 : 93 - 99