Clustering-based adaptive data augmentation for class-imbalance in machine learning (CADA): additive manufacturing use case

被引:6
|
作者
Dasari, Siva Krishna [1 ,2 ]
Cheddad, Abbas [1 ]
Palmquist, Jonatan [2 ]
Lundberg, Lars [1 ]
机构
[1] Blekinge Inst Technol, Dept Comp Sci, S-37141 Karlskrona, Sweden
[2] GKN Aerosp Engine Syst Sweden, Proc Engn Dept, Dept 9635-TL3, SE-46181 Trollhattan, Sweden
来源
NEURAL COMPUTING & APPLICATIONS | 2022年 / 37卷 / 2期
关键词
Class-imbalance; Melt-pool defects classification; Aerospace application; Additive manufacturing; Polar transformation; Random forests; CLASSIFICATION;
D O I
10.1007/s00521-022-07347-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large amount of data are generated from in-situ monitoring of additive manufacturing (AM) processes which is later used in prediction modelling for defect classification to speed up quality inspection of products. A high volume of this process data is defect-free (majority class) and a lower volume of this data has defects (minority class) which result in the class-imbalance issue. Using imbalanced datasets, classifiers often provide sub-optimal classification results, i.e. better performance on the majority class than the minority class. However, it is important for process engineers that models classify defects more accurately than the class with no defects since this is crucial for quality inspection. Hence, we address the class-imbalance issue in manufacturing process data to support in-situ quality control of additive manufactured components. For this, we propose cluster-based adaptive data augmentation (CADA) for oversampling to address the class-imbalance problem. Quantitative experiments are conducted to evaluate the performance of the proposed method and to compare with other selected oversampling methods using AM datasets from an aerospace industry and a publicly available casting manufacturing dataset. The results show that CADA outperformed random oversampling and the SMOTE method and is similar to random data augmentation and cluster-based oversampling. Furthermore, the results of the statistical significance test show that there is a significant difference between the studied methods. As such, the CADA method can be considered as an alternative method for oversampling to improve the performance of models on the minority class.
引用
收藏
页码:597 / 610
页数:14
相关论文
共 26 条
  • [1] A Kernel Clustering-Based Possibilistic Fuzzy Extreme Learning Machine for Class Imbalance Learning
    Shi-Xiong Xia
    Fan-Rong Meng
    Bing Liu
    Yong Zhou
    Cognitive Computation, 2015, 7 : 74 - 85
  • [2] A Kernel Clustering-Based Possibilistic Fuzzy Extreme Learning Machine for Class Imbalance Learning
    Xia, Shi-Xiong
    Meng, Fan-Rong
    Liu, Bing
    Zhou, Yong
    COGNITIVE COMPUTATION, 2015, 7 (01) : 74 - 85
  • [3] Clustering-Based Oversampling Algorithm for Multi-class Imbalance Learning
    Zhao, Haixia
    Wu, Jian
    JOURNAL OF CLASSIFICATION, 2024, : 205 - 220
  • [4] AWSMOTE: An SVM-Based Adaptive Weighted SMOTE for Class-Imbalance Learning
    Wang, Jia-Bao
    Zou, Chun-An
    Fu, Guang-Hui
    Scientific Programming, 2021, 2021
  • [5] AWSMOTE: An SVM-Based Adaptive Weighted SMOTE for Class-Imbalance Learning
    Wang, Jia-Bao
    Zou, Chun-An
    Fu, Guang-Hui
    SCIENTIFIC PROGRAMMING, 2021, 2021
  • [6] Adaptive Clustering-Based Model Aggregation for Federated Learning with Imbalanced Data
    Wang, Dong
    Zhang, Naifu
    Tao, Meixia
    SPAWC 2021: 2021 IEEE 22ND INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS (IEEE SPAWC 2021), 2020, : 591 - 595
  • [7] A clustering based ensemble of weighted kernelized extreme learning machine for class imbalance learning
    Choudhary, Roshani
    Shukla, Sanyam
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 164
  • [8] A STUDY OF MACHINE LEARNING ALGORITHMS TO MEASURE THE FEATURE IMPORTANCE IN CLASS-IMBALANCE DATA OF FOOD INSECURITY CASES IN INDONESIA
    Dharmawan, H.
    Sartono, B.
    Kurnia, A.
    Hadi, A. F.
    Ramadhani, E.
    COMMUNICATIONS IN MATHEMATICAL BIOLOGY AND NEUROSCIENCE, 2022,
  • [9] Prediction for Manufacturing Factors in a Steel Plate Rolling Smart Factory Using Data Clustering-Based Machine Learning
    Park, Cheol Young
    Kim, Jin Woog
    Kim, Bosung
    Lee, Joongyoon
    IEEE ACCESS, 2020, 8 : 60890 - 60905
  • [10] Machine Learning based Data Stream Merging in Additive Manufacturing
    Zenisek, Jan
    Groening, Holger
    Wild, Norbert
    Huskic, Aziz
    Affenzeller, Michael
    3RD INTERNATIONAL CONFERENCE ON INDUSTRY 4.0 AND SMART MANUFACTURING, 2022, 200 : 1422 - 1431