Improving Pipelining Tools for Pre-processing Data

被引:1
|
作者
Novo-Loures, Maria [1 ,2 ,3 ]
Lage, Yeray [1 ]
Pavon, Reyes [1 ,2 ,3 ]
Laza, Rosalia [1 ,2 ,3 ]
Ruano-Ordas, David [1 ,2 ,3 ]
Ramon Mendez, Jose [1 ,2 ,3 ]
机构
[1] Univ Vigo, Dept Comp Sci, ESEI Escuela Super Ingn Informat, Edificio Politecn,Campus Univ Lagoas S-N, Orense 32004, Spain
[2] Univ Vigo, Dept Comp Sci, Res Grp SI4, CINBIO, Orense 32004, Spain
[3] SERGAS UVIGO, Galicia Sur Hlth Res Inst IIS Galicia Sur, SING Res Grp, Vigo, Spain
关键词
Burst Processing; Data Pre-processing; !text type='Java']Java[!/text; Pipeline Frameworks; OPTIMIZATION;
D O I
10.9781/ijimai.2021.10.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The last several years have seen the emergence of data mining and its transformation into a powerful tool that adds value to business and research. Data mining makes it possible to explore and find unseen connections between variables and facts observed in different domains, helping us to better understand reality. The programming methods and frameworks used to analyse data have evolved over time. Currently, the use of pipelining schemes is the most reliable way of analysing data and due to this, several important companies are currently offering this kind of services. Moreover, several frameworks compatible with different programming languages are available for the development of computational pipelines and many research studies have addressed the optimization of data processing speed. However, as this study shows, the presence of early error detection techniques and developer support mechanisms is very limited in these frameworks. In this context, this study introduces different improvements, such as the design of different types of constraints for the early detection of errors, the creation of functions to facilitate debugging of concrete tasks included in a pipeline, the invalidation of erroneous instances and/or the introduction of the burst-processing scheme. Adding these functionalities, we developed Big Data Pipelining for Java (BDP4J, https://github.com/sing-group/bdp4j), a fully functional new pipelining framework that shows the potential of these features.
引用
收藏
页码:214 / 224
页数:11
相关论文
共 50 条
  • [1] Selective pre-processing of imbalanced data for improving classification performance
    Stefanowski, Jerzy
    Wilk, Szymon
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2008, 5182 : 283 - 292
  • [2] PRE-PROCESSING AND MODELING TOOLS FOR BIGDATA
    Hashem, Hadi
    Ranc, Daniel
    FOUNDATIONS OF COMPUTING AND DECISION SCIENCES, 2016, 41 (03) : 151 - 162
  • [3] Pre-processing of the speech data
    不详
    ROBUST ADAPTATION TO NON-NATIVE ACCENTS IN AUTOMATIC SPEECH RECOGNITION, 2002, 2560 : 15 - 19
  • [4] Pre-processing for data clustering
    Frigui, H
    NAFIPS 2004: ANNUAL MEETING OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY, VOLS 1AND 2: FUZZY SETS IN THE HEART OF THE CANADIAN ROCKIES, 2004, : 967 - 972
  • [5] Pre-Processing and Meshing Optimization for Electromagnetic Tools
    Moreno, J.
    Lozano, Lorena
    Algar, Ma J.
    Gonzalez, I.
    Catedra, F.
    2011 IEEE INTERNATIONAL SYMPOSIUM ON ANTENNAS AND PROPAGATION (APSURSI), 2011, : 317 - 320
  • [6] Improving Spectroface using Pre-processing and Voting
    Santos, Ricardo
    Alexandre, Luis A.
    2009 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INTELLIGENT SYSTEMS, PROCEEDINGS, VOL 4, 2009, : 552 - 556
  • [7] On Pre-processing Algorithms for Data Stream
    Duda, Piotr
    Jaworski, Maciej
    Pietruczuk, Lena
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT II, 2012, 7268 : 56 - 63
  • [8] Kurtosis removal for data pre-processing
    Loperfido, Nicola
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2023, 17 (01) : 239 - 267
  • [9] Kurtosis removal for data pre-processing
    Nicola Loperfido
    Advances in Data Analysis and Classification, 2023, 17 : 239 - 267
  • [10] Intelligent assistance for data pre-processing
    Bilalli, Besim
    Abello, Alberto
    Aluja-Banet, Tomas
    Wrembel, Robert
    COMPUTER STANDARDS & INTERFACES, 2018, 57 : 101 - 109