A new adaptive sampling algorithm for big data classification

被引:6
|
作者
Djouzi, Kheyreddine [1 ]
Beghdad-Bey, Kadda [1 ]
Amamra, Abdenour [1 ]
机构
[1] Ecole Mil Polytech, BP 17, Algiers 16111, Algeria
关键词
Big data; Data classification; Sampling methods; Subsampled Double Bootstrap; Naive Bayes classifier; BOOTSTRAP;
D O I
10.1016/j.jocs.2022.101653
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The exponential growth of the quantity of data that circulates on the web led to the emergence of the big data phenomenon. This fact is a natural consequence of the proliferation of social media, mobile devices, the abundance of free online storage, and new technologies like the internet of things. Subsequently, big data has created several challenges to the computer science community, among which the large size of data is the most challenging. Traditional machine learning algorithms used mostly for insight extraction find themselves inadequate, even on high-performance computer architectures. For instance, big data analytics algorithms can overcome the size issue by either: (1) adapting the existing machine learning techniques to the scale of the big data; or, (2) by sampling big datasets, choosing randomly much smaller subsets of the data population, to meet what current algorithms can handle. In the present work, we aim to proceed through the second alternative to address the size challenge in the big data context. We propose intelligent sampling techniques based on Scalable Simple Random Sampling (ScaSRS) and Subsampled Double Bootstrap (SDB). Test results carried out on public generic datasets show that our proposal is able to address the size dimension efficiently. The proposed algorithms were evaluative on a classification task where the obtained results provided significant improvement compared to the state-of-the-art.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] A Scalable Adaptive Sampling Based Approach for Big Data Classification
    Djouzi, Kheyreddine
    Beghdad-Bey, Kadda
    Amamra, Abdenour
    [J]. ADVANCES IN COMPUTING SYSTEMS AND APPLICATIONS, 2022, 513 : 73 - 83
  • [2] Adaptive Exponential Bat algorithm and deep learning for big data classification
    Mujeeb, S. Md
    Sam, R. Praveen
    Madhavi, K.
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2021, 46 (01):
  • [3] Adaptive Exponential Bat algorithm and deep learning for big data classification
    S Md Mujeeb
    R Praveen Sam
    K Madhavi
    [J]. Sādhanā, 2021, 46
  • [4] New mixed adaptive detection algorithm for moving target with big data
    Zhang, De-Gan
    Zhou, Shan
    Chen, Jie
    Liu, Si
    [J]. JOURNAL OF VIBROENGINEERING, 2016, 18 (07) : 4705 - 4719
  • [5] Adaptive Classification of Big Data Flight Sample
    Liu Fei
    Yin Zhiping
    Huang Qiqing
    Zhang Xiayang
    Liu Jiapeng
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTER AND COMPUTATIONAL SCIENCES (ICCCS), 2015, : 136 - 141
  • [6] Research on Data Classification Algorithm in Big Data Mining
    Liu Weigang
    [J]. 2019 2ND INTERNATIONAL CONFERENCE ON MECHANICAL, ELECTRONIC AND ENGINEERING TECHNOLOGY (MEET 2019), 2019, : 174 - 179
  • [7] Efficient kNN classification algorithm for big data
    Deng, Zhenyun
    Zhu, Xiaoshu
    Cheng, Debo
    Zong, Ming
    Zhang, Shichao
    [J]. NEUROCOMPUTING, 2016, 195 : 143 - 148
  • [8] Big Data Sampling Algorithm Based on Peak Detection
    Liu, Mengyu
    Wang, Yuhang
    Lin, Ruishi
    Wang, Shenhang
    Zheng, Wei
    [J]. PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 7573 - 7578
  • [9] A New Optimal Ensemble Algorithm Based on SVDD Sampling for Imbalanced Data Classification
    Pirgazi, Jamshid
    Pirmohammadi, Abbas
    Shams, Reza
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (06)
  • [10] New data strategies: nonprobability sampling, mobile, big data
    Link, Michael
    [J]. QUALITY ASSURANCE IN EDUCATION, 2018, 26 (02) : 303 - 314