A Model for Enhancing Unstructured Big Data Warehouse Execution Time

被引:0
|
作者
Farhan, Marwa Salah [1 ,2 ]
Youssef, Amira [1 ,3 ]
Abdelhamid, Laila [1 ]
机构
[1] Helwan Univ, Fac Comp & Artificial Intelligence, Dept Informat Syst, Cairo 11795, Egypt
[2] British Univ Egypt, Fac Informat & Comp Sci, Cairo 11837, Egypt
[3] Higher Inst Comp Sci & Informat Syst, Dept Comp Sci, Settlement 5, Cairo 11835, Egypt
关键词
big data; unstructured data warehouse; ELT; ETL;
D O I
10.3390/bdcc8020017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional data warehouses (DWs) have played a key role in business intelligence and decision support systems. However, the rapid growth of the data generated by the current applications requires new data warehousing systems. In big data, it is important to adapt the existing warehouse systems to overcome new issues and limitations. The main drawbacks of traditional Extract-Transform-Load (ETL) are that a huge amount of data cannot be processed over ETL and that the execution time is very high when the data are unstructured. This paper focuses on a new model consisting of four layers: Extract-Clean-Load-Transform (ECLT), designed for processing unstructured big data, with specific emphasis on text. The model aims to reduce execution time through experimental procedures. ECLT is applied and tested using Spark, which is a framework employed in Python. Finally, this paper compares the execution time of ECLT with different models by applying two datasets. Experimental results showed that for a data size of 1 TB, the execution time of ECLT is 41.8 s. When the data size increases to 1 million articles, the execution time is 119.6 s. These findings demonstrate that ECLT outperforms ETL, ELT, DELT, ELTL, and ELTA in terms of execution time.
引用
收藏
页数:26
相关论文
共 50 条
  • [41] ENHANCING THE ETL PROCESS IN DATA WAREHOUSE SYSTEMS
    Petre, Ruxandra
    [J]. PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON INFORMATICS IN ECONOMY (IE 2015): EDUCATION, RESEARCH & BUSINESS TECHNOLOGIES, 2015, : 392 - 397
  • [42] Efficient Data Management Tools for the Heterogeneous Big Data Warehouse
    Alekseev, A. A.
    Osipova, V. V.
    Ivanov, M. A.
    Klimentov, A.
    Grigorieva, N. V.
    Nalamwar, H. S.
    [J]. PHYSICS OF PARTICLES AND NUCLEI LETTERS, 2016, 13 (05) : 689 - 692
  • [43] Managing Unstructured Big Data in Healthcare System
    Kong, Hyoun-Joong
    [J]. HEALTHCARE INFORMATICS RESEARCH, 2019, 25 (01) : 1 - 2
  • [44] Big Data Warehouse for Healthcare-Sensitive Data Applications
    Shahid, Arsalan
    Nguyen, Thien-An Ngoc
    Kechadi, M-Tahar
    [J]. SENSORS, 2021, 21 (07)
  • [45] Associated Index for Big Structured and Unstructured Data
    Zhu, Chunying
    Li, Qingzhong
    Kong, Lanju
    Wang, Xiangwei
    Hong, Xiaoguang
    [J]. WEB-AGE INFORMATION MANAGEMENT (WAIM 2015), 2015, 9098 : 567 - 570
  • [46] Role and Challenges of Unstructured Big Data in Healthcare
    Adnan, Kiran
    Akbar, Rehan
    Khor, Siak Wang
    Ali, Adnan Bin Amanat
    [J]. DATA MANAGEMENT, ANALYTICS AND INNOVATION, ICDMAI 2019, VOL 1, 2020, 1042 : 301 - 323
  • [47] An approach to provide security to unstructured Big Data
    Islam, Md. Rafiqul
    Islam, Md. Ezazul
    [J]. 8TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA 2014), 2014,
  • [48] Monitoring and Control of Unstructured Manufacturing Big Data
    Cui, Yesheng
    Kara, Sami
    Chan, Ka C.
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND ENGINEERING MANAGEMENT (IEEE IEEM), 2020, : 928 - 932
  • [49] Unstructured medical frameworks using big data
    Banu, A. Arjuman
    Reshmy, A. K.
    [J]. RESEARCH JOURNAL OF PHARMACEUTICAL BIOLOGICAL AND CHEMICAL SCIENCES, 2016, 7 : 234 - 241
  • [50] USING NoSQL FOR PROCESSING UNSTRUCTURED BIG DATA
    Balakayeva, G. T.
    Phillips, C.
    Darkenbayev, D. K.
    Turdaliyev, M.
    [J]. NEWS OF THE NATIONAL ACADEMY OF SCIENCES OF THE REPUBLIC OF KAZAKHSTAN-SERIES OF GEOLOGY AND TECHNICAL SCIENCES, 2019, (06): : 12 - 21