A Survey of mislabeled training data detection techniques for pattern classification

被引:16
|
作者
Guan, Donghai [1 ,2 ]
Yuan, Weiwei [1 ,3 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Jiangsu, Peoples R China
[2] Harbin Engn Univ, Coll Automat, Harbin, Peoples R China
[3] Harbin Engn Univ, Coll Comp Sci & Technol, Harbin, Peoples R China
基金
中国国家自然科学基金;
关键词
Ensemble learning; Local learning; Mislabeled data detection; NEAREST-NEIGHBOR RULE; NOISE-DETECTION; ENSEMBLE; QUALITY; ELIMINATION; MICROARRAYS;
D O I
10.4103/0256-4602.125689
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Pattern classification is an important part of machine learning. To use it, a classifier is trained on the training data and then predicts the label for the future unseen data. To obtain a classifier with good performance, the quality of the training data plays an important role. Unfortunately in many areas, it is difficult to provide absolutely clean data. This paper focuses on mislabeled data, which is one of the main types of noisy data. A number of mislabeled data detection techniques have been proposed; however, there is no survey work to summarize those techniques. This paper reviews the existing studies and classifies them into three types: Local learning-based, ensemble learning-based, and single learning-based methods. The technical details, advantages, and disadvantages of these methods are discussed.
引用
收藏
页码:524 / 530
页数:7
相关论文
共 50 条
  • [31] Comparison of MLP Cost Functions to Dodge Mislabeled Training Data
    Nieminen, Paavo
    Karkkainen, Tommi
    2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [32] An Efficient Algorithm for the Detection of Outliers in Mislabeled Omics Data
    Sun, Hongwei
    Wang, Jiu
    Zhang, Zhongwen
    Hu, Naibao
    Wang, Tong
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2021, 2021 (2021)
  • [33] Robust Learning of Mislabeled Training Samples for Remote Sensing Image Scene Classification
    Tu, Bing
    Kuang, Wenlan
    He, Wangquan
    Zhang, Guoyun
    Peng, Yishu
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2020, 13 : 5623 - 5639
  • [34] Schizophrenia: A Survey of Artificial Intelligence Techniques Applied to Detection and Classification
    Lai, Joel Weijia
    Ang, Candice Ke En
    Acharya, U. Rajendra
    Cheong, Kang Hao
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2021, 18 (11)
  • [35] Detection and Classification of Tumors Using Medical Imaging Techniques: A Survey
    Garg, Sheetal
    Bhagyashree, S. R.
    INTELLIGENT COMMUNICATION TECHNOLOGIES AND VIRTUAL MOBILE NETWORKS, ICICV 2019, 2020, 33 : 363 - 372
  • [36] Classification of Intrusion Detection Using Data Mining Techniques
    Sahani, Roma
    Shatabdinalini
    Rout, Chinmayee
    Badajena, J. Chandrakanta
    Jena, Ajay Kumar
    Das, Himansu
    PROGRESS IN COMPUTING, ANALYTICS AND NETWORKING, ICCAN 2017, 2018, 710 : 753 - 764
  • [37] Application of Data Mining Techniques for Defect Detection and Classification
    Prakash, B. V. Ajay
    Ashoka, D. V.
    Aradya, V. N. Manjunath
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON FRONTIERS OF INTELLIGENT COMPUTING: THEORY AND APPLICATIONS (FICTA) 2014, VOL 1, 2015, 327 : 387 - 395
  • [38] A Comprehensive Survey of Imbalance Correction Techniques for Hyperspectral Data Classification
    Paoletti, Mercedes E.
    Mogollon-Gutierrez, Oscar
    Moreno-Alvarez, Sergio
    Sancho, Jose Carlos
    Haut, Juan M.
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 5297 - 5314
  • [39] Survey on Anomaly Detection using Data Mining Techniques
    Agrawal, Shikha
    Agrawal, Jitendra
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS 19TH ANNUAL CONFERENCE, KES-2015, 2015, 60 : 708 - 713
  • [40] A Survey on Malware Detection Using Data Mining Techniques
    Ye, Yanfang
    Li, Tao
    Adjeroh, Donald
    Iyengar, S. Sitharama
    ACM COMPUTING SURVEYS, 2017, 50 (03)