Implementation of Data Preprocessing Techniques on Distributed Big Data Platforms

被引:1
|
作者
Celik, Oguz [1 ]
Hasanbasoglu, Muruvvet [1 ]
Aktas, Mehmet S. [1 ]
Kalipsiz, Oya [1 ]
Kanli, Alper Nebi [2 ]
机构
[1] Yildiz Tech Univ, Dept Comp Engn, Istanbul, Turkey
[2] Cybersoft, R&D Ctr, Istanbul, Turkey
关键词
Big Data; Distributed Computing; Outlier Analysis; Missing Value Imputation;
D O I
10.1109/ubmk.2019.8907230
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We are now in the era of Big Data, and the need for tools which can process and analyze such data is yet to be fulfilled. Big data mining aims to extract meaningful and valuable information from voluminous data that traditional data mining tools can not handle. One of the most vital steps of any data mining process is the preprocessing of the data. Our aim was to provide distributed implementation of some algorithms for two of the data preprocessing steps: outlier analysis and missing value imputation. The algorithms were implemented on Spark and this paper will focus on the details and performance of these algorithms on different distributed system setups.
引用
收藏
页码:73 / 78
页数:6
相关论文
共 50 条
  • [31] Multiple Big Data Processing Platforms
    Chang, Bao Rong
    Tsai, Hsiu-Fen
    Chang, Yi-Sheng
    Huang, Chien-Feng
    [J]. 2016 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2016, : 207 - 211
  • [32] A survey on platforms for big data analytics
    Singh D.
    Reddy C.K.
    [J]. Journal of Big Data, 2 (1)
  • [33] Difficulties Implementing Big Data: A Big Data Implementation Study
    Spraker, Kyle
    [J]. HUMAN-COMPUTER INTERACTION: INTERACTION IN CONTEXT, HCI INTERNATIONAL 2018, PT II, 2018, 10902 : 410 - 418
  • [34] The Efficient Implementation of Distributed Indexing with Hadoop for Digital Investigations on Big Data
    Lee, Taerim
    Lee, Hyejoo
    Rhee, Kyung-Hyune
    Shin, Sang Uk
    [J]. COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2014, 11 (03) : 1037 - 1054
  • [35] Distributed computing and big data techniques for efficient fault detection and data management in wireless networks
    Ajmeera Kiran
    P. N. Renjith
    Sapna Gupta
    Srinivas Ambala
    Preethi Sambandam Raju
    Drakshayani Sriramsetti
    [J]. Optical and Quantum Electronics, 2023, 55
  • [36] Geospatial Big Data Platforms: A Comprehensive Review; [Zusammenfassung": Geospatial Big Data Platforms: ein umfassender Überblick]
    Loukili Y.
    Lakhrissi Y.
    Ali S.E.B.
    [J]. KN - Journal of Cartography and Geographic Information, 2022, 72 (4) : 293 - 308
  • [37] Distributed computing and big data techniques for efficient fault detection and data management in wireless networks
    Kiran, Ajmeera
    Renjith, P. N.
    Gupta, Sapna
    Ambala, Srinivas
    Raju, Preethi Sambandam
    Sriramsetti, Drakshayani
    [J]. OPTICAL AND QUANTUM ELECTRONICS, 2023, 55 (13)
  • [38] Data Protection of Internet Enterprise Platforms in the Era of Big Data
    Zhang, Jiaxing
    Yang, Anuo
    Feng Shuaishuai
    [J]. JOURNAL OF WEB ENGINEERING, 2022, 21 (03): : 861 - 877
  • [39] Trajectory big data: Data, applications and techniques
    Xu, Jia-Jie
    Zheng, Kai
    Chi, Ming-Min
    Zhu, Yang-Yong
    Yu, Xiao-Hui
    Zhou, Xiao-Fang
    [J]. Tongxin Xuebao/Journal on Communications, 2015, 36 (12):
  • [40] Feature Detection Techniques for Preprocessing Proteomic Data
    Sellers, Kimberly F.
    Miecznikowski, Jeffrey C.
    [J]. INTERNATIONAL JOURNAL OF BIOMEDICAL IMAGING, 2010, 2010