Toward feature selection in big data preprocessing based on hybrid cloud-based model

被引:0
|
作者
Noha Shehab
Mahmoud Badawy
H Arafat Ali
机构
[1] Mansoura University,Computers and Control Systems Engineering Department, Faculty of Engineering
[2] Ministry of Communications and Information Technology.,Information Technology Institute, Open Source Dept.
[3] Taibah University,undefined
[4] Computer Science and Information Dept.,undefined
来源
关键词
Analysis; Big data; Classification; Cloud; Feature selection; Firefly; WKNN;
D O I
暂无
中图分类号
学科分类号
摘要
Recently, big data are widely noticed in many fields like machine learning, pattern recognition, medical, financial, and transportation fields. Data analysis is crucial to converting data into more specific information fed to the decision-making systems. With the diverse and complex types of datasets, knowledge discovery becomes more difficult. One solution is to use feature subset selection preprocessing that reduces this complexity, so the computation and analysis become convenient. Preprocessing produces a reliable and suitable source for any data-mining algorithm. The effective features’ selection can improve a model’s performance and help us understand the characteristics and underlying structure of complex data. This study introduces a novel hybrid feature selection cloud-based model for imbalanced data based on the k nearest neighbor algorithm. The proposed model showed good performance compared with the simple weighted nearest neighbor. The proposed model combines the firefly distance metric and the Euclidean distance used in the k nearest neighbor. The experimental results showed good insights in both time usage and feature weights compared with the weighted nearest neighbor. It also showed improvement in the classification accuracy by 12% compared with the weighted nearest neighbor algorithm. And using the cloud-distributed model reduced the processing time up to 30%, which is deliberated to be substantial compared with the recent state-of-the-art methods.
引用
收藏
页码:3226 / 3265
页数:39
相关论文
共 50 条
  • [1] Toward feature selection in big data preprocessing based on hybrid cloud-based model
    Shehab, Noha
    Badawy, Mahmoud
    Ali, H. Arafat
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (03): : 3226 - 3265
  • [2] Toward a Cloud-based security intelligence with big data processing
    Benzidane, Karim
    El Alloussi, Hassan
    El Warrak, Othman
    Fetjah, Leila
    Andaloussi, Said Jai
    Sekkaki, Abderrahim
    NOMS 2016 - 2016 IEEE/IFIP NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM, 2016, : 1089 - 1092
  • [3] Memory Scaling of Cloud-Based Big Data Systems: A Hybrid Approach
    Wang, Xinying
    Xu, Cong
    Wang, Ke
    Yan, Feng
    Zhao, Dongfang
    IEEE TRANSACTIONS ON BIG DATA, 2022, 8 (05) : 1259 - 1272
  • [4] Cloud-Based Adaptive Particle Swarm Optimization for Waveband Selection in Big Data
    Li, Yujun
    Liang, Kun
    Tang, Xiaojun
    Gai, Keke
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2018, 90 (8-9): : 1105 - 1113
  • [5] Cloud-Based Adaptive Particle Swarm Optimization for Waveband Selection in Big Data
    Yujun Li
    Kun Liang
    Xiaojun Tang
    Keke Gai
    Journal of Signal Processing Systems, 2018, 90 : 1105 - 1113
  • [6] Enhancing Big Data Feature Selection Using a Hybrid Correlation-Based Feature Selection
    Mohamad, Masurah
    Selamat, Ali
    Krejcar, Ondrej
    Crespo, Ruben Gonzalez
    Herrera-Viedma, Enrique
    Fujita, Hamido
    ELECTRONICS, 2021, 10 (23)
  • [7] CADRE: A Cloud-Based Data Service for Big Bibliographic Data
    Yan, Xiaoran
    Ruan, Guangchen
    Nikolov, Dimitar
    Hutchinson, Matthew
    Kankanamalage, Chathuri Peli
    Serrette, Ben
    McCombs, James
    Walsh, Alan
    Tuna, Esen
    Pentchev, Valentin
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 4283 - 4292
  • [8] Performance Prediction of Cloud-Based Big Data Applications
    Ardagna, Danilo
    Barbierato, Enrico
    Evangelinou, Athanasia
    Gianniti, Eugenio
    Gribaudo, Marco
    Pinto, Tulio B. M.
    Guimaraes, Anna
    da Silva, Ana Paula Couto
    Almeida, Jussara M.
    PROCEEDINGS OF THE 2018 ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING (ICPE '18), 2018, : 192 - 199
  • [9] Strategic alignment of Cloud-based Architectures for Big Data
    Schmidt, Rainer
    Moehring, Michael
    17TH IEEE INTERNATIONAL ENTERPRISE DISTRIBUTED OBJECT COMPUTING CONFERENCE WORKSHOPS (EDOCW 2013), 2013, : 136 - 143
  • [10] Distributed and Cloud-based Big Data Analytics and Fusion
    Das, Subrata
    SIGNAL PROCESSING, SENSOR FUSION, AND TARGET RECOGNITION XXII, 2013, 8745