On the Data Quality and Imbalance in Machine Learning-based Design and Manufacturing-A Systematic Review

被引:0
|
作者
Xie, Jiarui [1 ]
Sun, Lijun [1 ,2 ]
Zhao, Yaoyao Fiona [1 ]
机构
[1] McGill Univ, Dept Mech Engn, Addit Design & Mfg Lab, Montreal, PQ H3A 0G4, Canada
[2] McGill Univ, Dept Civil Engn, Smart Transportat Lab, Montreal, PQ H3A 0G4, Canada
来源
ENGINEERING | 2025年 / 45卷
关键词
Machine learning; Design and manufacturing; Data quality; Data augmentation; Active learning; CONVOLUTIONAL NEURAL-NETWORK; DATA GOVERNANCE; DEEP; FRAMEWORK; VISION; METHODOLOGY; INSPECTION; SELECTION; MODEL;
D O I
10.1016/j.eng.2024.04.024
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Machine learning (ML) has recently enabled many modeling tasks in design, manufacturing, and condition monitoring due to its unparalleled learning ability using existing data. Data have become the limiting factor when implementing ML in industry. However, there is no systematic investigation on how data quality can be assessed and improved for ML-based design and manufacturing. The aim of this survey is to uncover the data challenges in this domain and review the techniques used to resolve them. To establish the background for the subsequent analysis, crucial data terminologies in ML-based modeling are reviewed and categorized into data acquisition, management, analysis, and utilization. Thereafter, the concepts and frameworks established to evaluate data quality and imbalance, including data quality assessment, data readiness, information quality, data biases, fairness, and diversity, are further investigated. The root causes and types of data challenges, including human factors, complex systems, complicated relationships, lack of data quality, data heterogeneity, data imbalance, and data scarcity, are identified and summarized. Methods to improve data quality and mitigate data imbalance and their applications in this domain are reviewed. This literature review focuses on two promising methods: data augmentation and active learning. The strengths, limitations, and applicability of the surveyed techniques are illustrated. The trends of data augmentation and active learning are discussed with respect to their applications, data types, and approaches. Based on this discussion, future directions for data quality improvement and data imbalance mitigation in this domain are identified. (c) 2024 THE AUTHORS. Published by Elsevier LTD on behalf of Chinese Academy of Engineering and Higher Education Press Limited Company. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:105 / 131
页数:27
相关论文
共 50 条
  • [31] Machine Learning-Based Resource Management in Fog Computing: A Systematic Literature Review
    Khan, Fahim Ullah
    Shah, Ibrar Ali
    Jan, Sadaqat
    Ahmad, Shabir
    Whangbo, Taegkeun
    SENSORS, 2025, 25 (03)
  • [32] MACHINE LEARNING-BASED PREDICTION MODELS FOR C DIFFICILE INFECTION: A SYSTEMATIC REVIEW
    Tariq, Raseen
    Redij, Renisha
    Arunachalam, Shivaram Poigai
    Faubion, William
    Khanna, Sahil
    GASTROENTEROLOGY, 2023, 164 (06) : S1176 - S1176
  • [33] Machine Learning-Based Predictive Models for Patients with Venous Thromboembolism: A Systematic Review
    Danilatou, Vasiliki
    Dimopoulos, Dimitrios
    Kostoulas, Theodoros
    Douketis, James
    THROMBOSIS AND HAEMOSTASIS, 2024, 124 (11) : 1040 - 1052
  • [34] Machine Learning-Based Prediction Models for Clostridioides difficile Infection: A Systematic Review
    Tariq, Raseen
    Malik, Sheza
    Redij, Renisha
    Arunachalam, Shivaram
    Faubion, Jr William A.
    Khanna, Sahil
    CLINICAL AND TRANSLATIONAL GASTROENTEROLOGY, 2024, 15 (06)
  • [35] Machine learning-based performance predictions for steels considering manufacturing process parameters: a review
    Fang, Wei
    Huang, Jia-xin
    Peng, Tie-xu
    Long, Yang
    Yin, Fu-xing
    JOURNAL OF IRON AND STEEL RESEARCH INTERNATIONAL, 2024, 31 (07) : 1555 - 1581
  • [36] Machine Learning-Based Prediction of Air Quality
    Liang, Yun-Chia
    Maimury, Yona
    Chen, Angela Hsiang-Ling
    Juarez, Josue Rodolfo Cuevas
    APPLIED SCIENCES-BASEL, 2020, 10 (24): : 1 - 17
  • [37] A Weighted Machine Learning-Based Attacks Classification to Alleviating Class Imbalance
    Chkirbene, Zina
    Erbad, Aiman
    Hamila, Ridha
    Gouissem, Ala
    Mohamed, Amr
    Guizani, Mohsen
    Hamdi, Mounir
    IEEE SYSTEMS JOURNAL, 2021, 15 (04): : 4780 - 4791
  • [38] Pitfalls of Machine Learning-Based Personnel Selection Fairness, Transparency, and Data Quality
    Goretzko, David
    Finja Israel, Laura Sophia
    JOURNAL OF PERSONNEL PSYCHOLOGY, 2022, 21 (01) : 37 - 47
  • [39] Data Curation and Quality Evaluation for Machine Learning-Based Cyber Intrusion Detection
    Tran, Ngan
    Chen, Haihua
    Bhuyan, Jay
    Ding, Junhua
    IEEE ACCESS, 2022, 10 : 121900 - 121923
  • [40] Deep reinforcement learning-based dynamic scheduling for resilient and sustainable manufacturing: A systematic review
    Zhang, Chao
    Juraschek, Max
    Herrmann, Christoph
    JOURNAL OF MANUFACTURING SYSTEMS, 2024, 77 : 962 - 989