Toward feature selection in big data preprocessing based on hybrid cloud-based model

被引:0
|
作者
Noha Shehab
Mahmoud Badawy
H Arafat Ali
机构
[1] Mansoura University,Computers and Control Systems Engineering Department, Faculty of Engineering
[2] Ministry of Communications and Information Technology.,Information Technology Institute, Open Source Dept.
[3] Taibah University,undefined
[4] Computer Science and Information Dept.,undefined
来源
关键词
Analysis; Big data; Classification; Cloud; Feature selection; Firefly; WKNN;
D O I
暂无
中图分类号
学科分类号
摘要
Recently, big data are widely noticed in many fields like machine learning, pattern recognition, medical, financial, and transportation fields. Data analysis is crucial to converting data into more specific information fed to the decision-making systems. With the diverse and complex types of datasets, knowledge discovery becomes more difficult. One solution is to use feature subset selection preprocessing that reduces this complexity, so the computation and analysis become convenient. Preprocessing produces a reliable and suitable source for any data-mining algorithm. The effective features’ selection can improve a model’s performance and help us understand the characteristics and underlying structure of complex data. This study introduces a novel hybrid feature selection cloud-based model for imbalanced data based on the k nearest neighbor algorithm. The proposed model showed good performance compared with the simple weighted nearest neighbor. The proposed model combines the firefly distance metric and the Euclidean distance used in the k nearest neighbor. The experimental results showed good insights in both time usage and feature weights compared with the weighted nearest neighbor. It also showed improvement in the classification accuracy by 12% compared with the weighted nearest neighbor algorithm. And using the cloud-distributed model reduced the processing time up to 30%, which is deliberated to be substantial compared with the recent state-of-the-art methods.
引用
收藏
页码:3226 / 3265
页数:39
相关论文
共 50 条
  • [21] Cloud-Based Visual Analytics for Smart Grids Big Data
    Munshi, Amr A.
    Mohamed, Yasser A. I.
    2016 IEEE POWER & ENERGY SOCIETY INNOVATIVE SMART GRID TECHNOLOGIES CONFERENCE (ISGT), 2016,
  • [22] Feature selection using cloud-based parallel genetic algorithm for intrusion detection data classification
    Dželila Mehanović
    Dino Kečo
    Jasmin Kevrić
    Samed Jukić
    Adnan Miljković
    Zerina Mašetić
    Neural Computing and Applications, 2021, 33 : 11861 - 11873
  • [23] QuagmiR: a cloud-based application for isomiR big data analytics
    Bofill-De Ros, Xavier
    Chen, Kevin
    Chen, Susanna
    Tesic, Nikola
    Randjelovic, Dusan
    Skundric, Nikola
    Nesic, Svetozar
    Varjacic, Vojislav
    Williams, Elizabeth H.
    Malhotra, Raunaq
    Jiang, Minjie
    Gu, Shuo
    BIOINFORMATICS, 2019, 35 (09) : 1576 - 1578
  • [24] Adaptive Cloud-Based Big Data Analytics Model for Sustainable Supply Chain Management
    Stefanovic, Nenad
    Radenkovic, Milos
    Bogdanovic, Zorica
    Plasic, Jelena
    Gaborovic, Andrijana
    SUSTAINABILITY, 2025, 17 (01)
  • [25] Feature selection using cloud-based parallel genetic algorithm for intrusion detection data classification
    Mehanovic, Dzelila
    Keco, Dino
    Kevric, Jasmin
    Jukic, Samed
    Miljkovic, Adnan
    Masetic, Zerina
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (18): : 11861 - 11873
  • [26] A Smart Agricultural Model by Integrating IoT, Mobile and Cloud-based Big Data Analytics
    Rajeswari, S.
    Suthendran, K.
    Rajakumar, K.
    PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL (I2C2), 2017,
  • [27] A (fire)cloud-based DNA methylation data preprocessing and quality control platform
    Kangeyan, Divy
    Dunford, Andrew
    Iyer, Sowmya
    Stewart, Chip
    Hanna, Megan
    Getz, Gad
    Aryee, Martin J.
    BMC BIOINFORMATICS, 2019, 20 (1)
  • [28] A (fire)cloud-based DNA methylation data preprocessing and quality control platform
    Divy Kangeyan
    Andrew Dunford
    Sowmya Iyer
    Chip Stewart
    Megan Hanna
    Gad Getz
    Martin J. Aryee
    BMC Bioinformatics, 20
  • [29] CloudTP: A Cloud-based Flexible Trajectory Preprocessing Framework
    Ruan, Sijie
    Li, Ruiyuan
    Bao, Jie
    He, Tianfu
    Zheng, Yu
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1601 - 1604
  • [30] Toward Data Integrity Architecture for Cloud-Based AI Systems
    Witanto, Elizabeth Nathania
    Oktian, Yustus Eko
    Lee, Sang-Gon
    SYMMETRY-BASEL, 2022, 14 (02):