Improving Pipelining Tools for Pre-processing Data

被引:1
|
作者
Novo-Loures, Maria [1 ,2 ,3 ]
Lage, Yeray [1 ]
Pavon, Reyes [1 ,2 ,3 ]
Laza, Rosalia [1 ,2 ,3 ]
Ruano-Ordas, David [1 ,2 ,3 ]
Ramon Mendez, Jose [1 ,2 ,3 ]
机构
[1] Univ Vigo, Dept Comp Sci, ESEI Escuela Super Ingn Informat, Edificio Politecn,Campus Univ Lagoas S-N, Orense 32004, Spain
[2] Univ Vigo, Dept Comp Sci, Res Grp SI4, CINBIO, Orense 32004, Spain
[3] SERGAS UVIGO, Galicia Sur Hlth Res Inst IIS Galicia Sur, SING Res Grp, Vigo, Spain
关键词
Burst Processing; Data Pre-processing; !text type='Java']Java[!/text; Pipeline Frameworks; OPTIMIZATION;
D O I
10.9781/ijimai.2021.10.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The last several years have seen the emergence of data mining and its transformation into a powerful tool that adds value to business and research. Data mining makes it possible to explore and find unseen connections between variables and facts observed in different domains, helping us to better understand reality. The programming methods and frameworks used to analyse data have evolved over time. Currently, the use of pipelining schemes is the most reliable way of analysing data and due to this, several important companies are currently offering this kind of services. Moreover, several frameworks compatible with different programming languages are available for the development of computational pipelines and many research studies have addressed the optimization of data processing speed. However, as this study shows, the presence of early error detection techniques and developer support mechanisms is very limited in these frameworks. In this context, this study introduces different improvements, such as the design of different types of constraints for the early detection of errors, the creation of functions to facilitate debugging of concrete tasks included in a pipeline, the invalidation of erroneous instances and/or the introduction of the burst-processing scheme. Adding these functionalities, we developed Big Data Pipelining for Java (BDP4J, https://github.com/sing-group/bdp4j), a fully functional new pipelining framework that shows the potential of these features.
引用
收藏
页码:214 / 224
页数:11
相关论文
共 50 条
  • [41] The Appliance of Data Pre-processing in Geological Modeling
    Zhang, Wei
    Li, Z. -P.
    Rong, Wang
    Wang, W. -X.
    2011 INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND NEURAL COMPUTING (FSNC 2011), VOL V, 2011, : 606 - 610
  • [42] Data pre-processing pipeline generation for AutoETL
    Giovanelli, Joseph
    Bilalli, Besim
    Abelló, Alberto
    Information Systems, 2022, 108
  • [43] The Appliance of Data Pre-processing in Geological Modeling
    Zhang, Wei
    Li, Z. -P.
    Rong, Wang
    Wang, W. -X.
    2011 AASRI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INDUSTRY APPLICATION (AASRI-AIIA 2011), VOL 3, 2011, : 75 - 79
  • [44] Pre-processing method of data processing for phased array radar
    Yang, Chenyang
    Li, Shaohong
    Mao, Shiyi
    Zhang, Zhaowu
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 1998, 26 (03): : 80 - 85
  • [45] A framework of irregularity enlightenment for data pre-processing in data mining
    Au, Siu-Tong
    Duan, Rong
    Hesar, Siamak G.
    Jiang, Wei
    ANNALS OF OPERATIONS RESEARCH, 2010, 174 (01) : 47 - 66
  • [46] The application of data pre-processing technology in the geoscience big data
    Wang ChengBin
    Ma XiaoGang
    Chen JianGuo
    ACTA PETROLOGICA SINICA, 2018, 34 (02) : 303 - 313
  • [47] Methods for pre-processing smartcard data to improve data quality
    Robinson, Steve
    Narayanan, Baskaran
    Toh, Nelson
    Pereira, Francisco
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2014, 49 : 43 - 58
  • [48] AIS Data Pre-Processing for Trajectory Clustering Data Preparation
    Hartawan, I. Putu Noven
    Widyantara, I. Made Oka
    Karyawati, A. A. I. N. E.
    Er, Ngurah Indra
    Artana, Ketut Buda
    Sastra, Nyoman Putra
    PROCEEDINGS OF THE 2021 IEEE INTERNATIONAL CONFERENCE ON AEROSPACE ELECTRONICS AND REMOTE SENSING TECHNOLOGY (ICARES 2021), 2021,
  • [49] Data pre-processing for analyzing microbiome data - A mini review
    Zhou, Ruwen
    Ng, Siu Kin
    Sung, Joseph Jao Yiu
    Goh, Wilson Wen Bin
    Wong, Sunny Hei
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2023, 21 : 4804 - 4815
  • [50] A framework of irregularity enlightenment for data pre-processing in data mining
    Siu-Tong Au
    Rong Duan
    Siamak G. Hesar
    Wei Jiang
    Annals of Operations Research, 2010, 174 : 47 - 66