Improving Pipelining Tools for Pre-processing Data

被引:1
|
作者
Novo-Loures, Maria [1 ,2 ,3 ]
Lage, Yeray [1 ]
Pavon, Reyes [1 ,2 ,3 ]
Laza, Rosalia [1 ,2 ,3 ]
Ruano-Ordas, David [1 ,2 ,3 ]
Ramon Mendez, Jose [1 ,2 ,3 ]
机构
[1] Univ Vigo, Dept Comp Sci, ESEI Escuela Super Ingn Informat, Edificio Politecn,Campus Univ Lagoas S-N, Orense 32004, Spain
[2] Univ Vigo, Dept Comp Sci, Res Grp SI4, CINBIO, Orense 32004, Spain
[3] SERGAS UVIGO, Galicia Sur Hlth Res Inst IIS Galicia Sur, SING Res Grp, Vigo, Spain
关键词
Burst Processing; Data Pre-processing; !text type='Java']Java[!/text; Pipeline Frameworks; OPTIMIZATION;
D O I
10.9781/ijimai.2021.10.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The last several years have seen the emergence of data mining and its transformation into a powerful tool that adds value to business and research. Data mining makes it possible to explore and find unseen connections between variables and facts observed in different domains, helping us to better understand reality. The programming methods and frameworks used to analyse data have evolved over time. Currently, the use of pipelining schemes is the most reliable way of analysing data and due to this, several important companies are currently offering this kind of services. Moreover, several frameworks compatible with different programming languages are available for the development of computational pipelines and many research studies have addressed the optimization of data processing speed. However, as this study shows, the presence of early error detection techniques and developer support mechanisms is very limited in these frameworks. In this context, this study introduces different improvements, such as the design of different types of constraints for the early detection of errors, the creation of functions to facilitate debugging of concrete tasks included in a pipeline, the invalidation of erroneous instances and/or the introduction of the burst-processing scheme. Adding these functionalities, we developed Big Data Pipelining for Java (BDP4J, https://github.com/sing-group/bdp4j), a fully functional new pipelining framework that shows the potential of these features.
引用
收藏
页码:214 / 224
页数:11
相关论文
共 50 条
  • [31] NanoStringNormCNV: pre-processing of NanoString CNV data
    Sendorek, Dorota H.
    Lalonde, Emilie
    Yao, Cindy Q.
    Sabelnykova, Veronica Y.
    Bristow, Robert G.
    Boutros, Paul C.
    BIOINFORMATICS, 2018, 34 (06) : 1034 - 1036
  • [32] Data pre-processing for obstacle in automotive applications
    Wahl, M
    Georges, D
    Dang, M
    IEEE CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, 1997, : 409 - 414
  • [33] An intelligent data pre-processing of complex datasets
    Abdul-Rahman, Shuzlina
    Abu Bakar, Azuraliza
    Mohamed-Hussein, Zeti-Azura
    INTELLIGENT DATA ANALYSIS, 2012, 16 (02) : 305 - 325
  • [34] Pre-processing of RDF data for METIS partitioning
    Benhamed S.
    Nait-Bahloul S.
    International Journal of Metadata, Semantics and Ontologies, 2023, 16 (02) : 152 - 171
  • [35] Ground data pre-processing for airborne scanner
    Zhu, Fuqing
    Hongwai Yu Haomibo Xuebao/Journal of Infrared and Millimeter Waves, 1992, 11 (03): : 227 - 234
  • [36] Big Data Pre-Processing: A Quality Framework
    Taleb, Ikbal
    Dssouli, Rachida
    Serhani, Mohamed Adel
    2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 191 - 198
  • [37] Pre-processing of meteorological data: Vertical profiles
    Erbrink, JJ
    Cenedese, A
    Cosemans, G
    Lasserre-Bigorry, A
    Weber, H
    Stubi, R
    INTERNATIONAL JOURNAL OF ENVIRONMENT AND POLLUTION, 1997, 8 (3-6) : 465 - 477
  • [38] PreP:: gene expression data pre-processing
    de la Nava, JG
    van Hijum, S
    Trelles, O
    BIOINFORMATICS, 2003, 19 (17) : 2328 - 2329
  • [39] Pre-processing of Partition Data for Enhancement of LOLIMOT
    Killian, Michaela
    Grosswindhager, Stefan
    Kozek, Martin
    Mayer, Barbara
    2013 8TH EUROSIM CONGRESS ON MODELLING AND SIMULATION (EUROSIM), 2013, : 271 - 275
  • [40] Analysis of activity detection data pre-processing
    Alexan, Anca
    Alexan, Alexandru
    Stefan, Oniga
    Pap, Iuliu Alexandru
    2019 IEEE 25TH INTERNATIONAL SYMPOSIUM FOR DESIGN AND TECHNOLOGY IN ELECTRONIC PACKAGING (SIITME 2019), 2019, : 282 - 286