On the Data Quality and Imbalance in Machine Learning-based Design and Manufacturing-A Systematic Review

被引:0
|
作者
Xie, Jiarui [1 ]
Sun, Lijun [1 ,2 ]
Zhao, Yaoyao Fiona [1 ]
机构
[1] McGill Univ, Dept Mech Engn, Addit Design & Mfg Lab, Montreal, PQ H3A 0G4, Canada
[2] McGill Univ, Dept Civil Engn, Smart Transportat Lab, Montreal, PQ H3A 0G4, Canada
来源
ENGINEERING | 2025年 / 45卷
关键词
Machine learning; Design and manufacturing; Data quality; Data augmentation; Active learning; CONVOLUTIONAL NEURAL-NETWORK; DATA GOVERNANCE; DEEP; FRAMEWORK; VISION; METHODOLOGY; INSPECTION; SELECTION; MODEL;
D O I
10.1016/j.eng.2024.04.024
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Machine learning (ML) has recently enabled many modeling tasks in design, manufacturing, and condition monitoring due to its unparalleled learning ability using existing data. Data have become the limiting factor when implementing ML in industry. However, there is no systematic investigation on how data quality can be assessed and improved for ML-based design and manufacturing. The aim of this survey is to uncover the data challenges in this domain and review the techniques used to resolve them. To establish the background for the subsequent analysis, crucial data terminologies in ML-based modeling are reviewed and categorized into data acquisition, management, analysis, and utilization. Thereafter, the concepts and frameworks established to evaluate data quality and imbalance, including data quality assessment, data readiness, information quality, data biases, fairness, and diversity, are further investigated. The root causes and types of data challenges, including human factors, complex systems, complicated relationships, lack of data quality, data heterogeneity, data imbalance, and data scarcity, are identified and summarized. Methods to improve data quality and mitigate data imbalance and their applications in this domain are reviewed. This literature review focuses on two promising methods: data augmentation and active learning. The strengths, limitations, and applicability of the surveyed techniques are illustrated. The trends of data augmentation and active learning are discussed with respect to their applications, data types, and approaches. Based on this discussion, future directions for data quality improvement and data imbalance mitigation in this domain are identified. (c) 2024 THE AUTHORS. Published by Elsevier LTD on behalf of Chinese Academy of Engineering and Higher Education Press Limited Company. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:105 / 131
页数:27
相关论文
共 50 条
  • [1] Machine learning and deep learning based predictive quality in manufacturing: a systematic review
    Tercan, Hasan
    Meisen, Tobias
    JOURNAL OF INTELLIGENT MANUFACTURING, 2022, 33 (07) : 1879 - 1905
  • [2] Machine learning and deep learning based predictive quality in manufacturing: a systematic review
    Hasan Tercan
    Tobias Meisen
    Journal of Intelligent Manufacturing, 2022, 33 : 1879 - 1905
  • [3] Machine learning-based inverse design methods considering data characteristics and design space size in materials design and manufacturing: a review
    Lee, Junhyeong
    Park, Donggeun
    Lee, Mingyu
    Lee, Hugon
    Park, Kundo
    Lee, Ikjin
    Ryu, Seunghwa
    MATERIALS HORIZONS, 2023, 10 (12) : 5436 - 5456
  • [4] Machine learning-based design for additive manufacturing in biomedical engineering
    Wu, Chi
    Wan, Boyang
    Entezari, Ali
    Fang, Jianguang
    Xu, Yanan
    Li, Qing
    INTERNATIONAL JOURNAL OF MECHANICAL SCIENCES, 2024, 266
  • [5] A systematic review on data of additive manufacturing for machine learning applications: the data quality, type, preprocessing, and management
    Zhang, Ying
    Safdar, Mutahar
    Xie, Jiarui
    Li, Jinghao
    Sage, Manuel
    Zhao, Yaoyao Fiona
    JOURNAL OF INTELLIGENT MANUFACTURING, 2023, 34 (08) : 3305 - 3340
  • [6] A systematic review on data of additive manufacturing for machine learning applications: the data quality, type, preprocessing, and management
    Ying Zhang
    Mutahar Safdar
    Jiarui Xie
    Jinghao Li
    Manuel Sage
    Yaoyao Fiona Zhao
    Journal of Intelligent Manufacturing, 2023, 34 : 3305 - 3340
  • [7] Machine Learning-Based Process Optimization in Biopolymer Manufacturing: A Review
    Malashin, Ivan
    Martysyuk, Dmitriy
    Tynchenko, Vadim
    Gantimurov, Andrei
    Semikolenov, Andrey
    Nelyub, Vladimir
    Borodulin, Aleksei
    POLYMERS, 2024, 16 (23)
  • [8] A Novel Method in Intelligent Synthetic Data Creation for Machine Learning-based Manufacturing Quality Control
    Pahren, Laura
    Thomas, Paul
    Jia, Xiaodong
    Lee, Jay
    IFAC PAPERSONLINE, 2022, 55 (19): : 73 - 78
  • [9] Current Status and Quality of Machine Learning-Based Radiomics Studies for Glioma Grading: A Systematic Review
    Tabatabaei, Mohsen
    Razaei, Ali
    Sarrami, Amir Hossein
    Saadatpour, Zahra
    Singhal, Aparna
    Sotoudeh, Houman
    ONCOLOGY, 2021, : 433 - 443
  • [10] Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models
    Navarro, Constanza L. Andaur
    Damen, Johanna A. A.
    van Smeden, Maarten
    Takada, Toshihiko
    Nijman, Steven W. J.
    Dhiman, Paula
    Ma, Jie
    Collins, Gary S.
    Bajpai, Ram
    Riley, Richard D.
    Moons, Karel G. M.
    Hooft, Lotty
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 2023, 154 : 8 - 22