Improving Pipelining Tools for Pre-processing Data

被引:1
|
作者
Novo-Loures, Maria [1 ,2 ,3 ]
Lage, Yeray [1 ]
Pavon, Reyes [1 ,2 ,3 ]
Laza, Rosalia [1 ,2 ,3 ]
Ruano-Ordas, David [1 ,2 ,3 ]
Ramon Mendez, Jose [1 ,2 ,3 ]
机构
[1] Univ Vigo, Dept Comp Sci, ESEI Escuela Super Ingn Informat, Edificio Politecn,Campus Univ Lagoas S-N, Orense 32004, Spain
[2] Univ Vigo, Dept Comp Sci, Res Grp SI4, CINBIO, Orense 32004, Spain
[3] SERGAS UVIGO, Galicia Sur Hlth Res Inst IIS Galicia Sur, SING Res Grp, Vigo, Spain
关键词
Burst Processing; Data Pre-processing; !text type='Java']Java[!/text; Pipeline Frameworks; OPTIMIZATION;
D O I
10.9781/ijimai.2021.10.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The last several years have seen the emergence of data mining and its transformation into a powerful tool that adds value to business and research. Data mining makes it possible to explore and find unseen connections between variables and facts observed in different domains, helping us to better understand reality. The programming methods and frameworks used to analyse data have evolved over time. Currently, the use of pipelining schemes is the most reliable way of analysing data and due to this, several important companies are currently offering this kind of services. Moreover, several frameworks compatible with different programming languages are available for the development of computational pipelines and many research studies have addressed the optimization of data processing speed. However, as this study shows, the presence of early error detection techniques and developer support mechanisms is very limited in these frameworks. In this context, this study introduces different improvements, such as the design of different types of constraints for the early detection of errors, the creation of functions to facilitate debugging of concrete tasks included in a pipeline, the invalidation of erroneous instances and/or the introduction of the burst-processing scheme. Adding these functionalities, we developed Big Data Pipelining for Java (BDP4J, https://github.com/sing-group/bdp4j), a fully functional new pipelining framework that shows the potential of these features.
引用
收藏
页码:214 / 224
页数:11
相关论文
共 50 条
  • [21] Efficient Pre-Processing Techniques for Improving Classifiers Performance
    Nickolas, S.
    Shobha, K.
    JOURNAL OF WEB ENGINEERING, 2022, 21 (02): : 203 - 228
  • [22] Improving Optical Braille Recognition in Pre-processing stage
    Murthy, Vishwanath Venkatesh
    Hanumanthappa, M.
    IEEE INTERNATIONAL CONFERENCE ON SOFT-COMPUTING AND NETWORK SECURITY (ICSNS 2018), 2018, : 179 - 181
  • [23] Improving Localization Accuracy of Neural Sources by Pre-processing: Demonstration With Infant MEG Data
    Clarke, Maggie D.
    Larson, Eric
    Peterson, Erica R.
    McCloy, Daniel R.
    Bosseler, Alexis N.
    Taulu, Samu
    FRONTIERS IN NEUROLOGY, 2022, 13
  • [24] Data Pre-processing Based on Convolutional Neural Network for Improving Precision of Indoor Positioning
    Lu, Eric Hsueh-Chan
    Chang, Kuei-Hua
    Ciou, Jing-Mei
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT I, 2020, 12033 : 545 - 552
  • [25] Online calibration and pre-processing of TAMA data
    Tatsumi, D
    Tsunesada, Y
    CLASSICAL AND QUANTUM GRAVITY, 2004, 21 (05) : S451 - S456
  • [26] Data pre-processing pipeline generation for AutoETL
    Giovanelli, Joseph
    Bilalli, Besim
    Abello, Alberto
    INFORMATION SYSTEMS, 2022, 108
  • [27] Application of pre-processing of NIRS modeling data
    Wang Zhihong
    Lin Jun
    PROCEEDINGS OF THE FIRST INTERNATIONAL SYMPOSIUM ON TEST AUTOMATION & INSTRUMENTATION, VOLS 1 - 3, 2006, : 295 - 298
  • [28] Parallel Pre-processing of Affymetrix Microarray Data
    Guzzi, Pietro Hiram
    Cannataro, Mario
    EURO-PAR 2010 PARALLEL PROCESSING WORKSHOPS, 2011, 6586 : 225 - 232
  • [29] SumatraTT:: a generic data pre-processing system
    Aubrecht, P
    Miksovsky, P
    Král, L
    14TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2003, : 120 - 124
  • [30] A study on data pre-processing in reverse engineering
    Liu Deping
    Shangguan Jianlin
    Chen Jianjun
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MECHANICAL TRANSMISSIONS, VOLS 1 AND 2, 2006, : 1428 - 1432