A Survey of mislabeled training data detection techniques for pattern classification

被引:16
|
作者
Guan, Donghai [1 ,2 ]
Yuan, Weiwei [1 ,3 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Jiangsu, Peoples R China
[2] Harbin Engn Univ, Coll Automat, Harbin, Peoples R China
[3] Harbin Engn Univ, Coll Comp Sci & Technol, Harbin, Peoples R China
基金
中国国家自然科学基金;
关键词
Ensemble learning; Local learning; Mislabeled data detection; NEAREST-NEIGHBOR RULE; NOISE-DETECTION; ENSEMBLE; QUALITY; ELIMINATION; MICROARRAYS;
D O I
10.4103/0256-4602.125689
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Pattern classification is an important part of machine learning. To use it, a classifier is trained on the training data and then predicts the label for the future unseen data. To obtain a classifier with good performance, the quality of the training data plays an important role. Unfortunately in many areas, it is difficult to provide absolutely clean data. This paper focuses on mislabeled data, which is one of the main types of noisy data. A number of mislabeled data detection techniques have been proposed; however, there is no survey work to summarize those techniques. This paper reviews the existing studies and classifies them into three types: Local learning-based, ensemble learning-based, and single learning-based methods. The technical details, advantages, and disadvantages of these methods are discussed.
引用
收藏
页码:524 / 530
页数:7
相关论文
共 50 条
  • [1] Novel mislabeled training data detection algorithm
    Yuan, Weiwei
    Guan, Donghai
    Zhu, Qi
    Ma, Tinghuai
    NEURAL COMPUTING & APPLICATIONS, 2018, 29 (10): : 673 - 683
  • [2] Novel mislabeled training data detection algorithm
    Weiwei Yuan
    Donghai Guan
    Qi Zhu
    Tinghuai Ma
    Neural Computing and Applications, 2018, 29 : 673 - 683
  • [3] Detection and Correction of Mislabeled Training Samples for Hyperspectral Image Classification
    Kang, Xudong
    Duan, Puhong
    Xiang, Xuanlin
    Li, Shutao
    Benediktsson, Jon Atli
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2018, 56 (10): : 5673 - 5686
  • [4] Bayesian classification using a noninformative prior and mislabeled training data
    Lynch Jr., Robert S.
    Willett, Peter K.
    Journal of the Franklin Institute, 1999, 336 (05): : 809 - 819
  • [5] Bayesian classification using a noninformative prior and mislabeled training data
    Lynch, RS
    Willett, PK
    JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 1999, 336 (05): : 809 - 819
  • [6] Classification using Dirichlet priors when the training data are mislabeled
    Lynch Jr., Robert S.
    Willett, Peter K.
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1999, 5 : 2973 - 2976
  • [7] Classification using Dirichlet priors when the training data are mislabeled
    Lynch, RS
    Willett, PK
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 2973 - 2976
  • [8] Identifying mislabeled training data
    Brodley, CE
    Friedl, MA
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1999, 11 : 131 - 167
  • [9] Interactive Correction of Mislabeled Training Data
    Xiang, Shouxing
    Ye, Xi
    Xia, Jiazhi
    Wu, Jing
    Chen, Yang
    Liu, Shixia
    2019 IEEE CONFERENCE ON VISUAL ANALYTICS SCIENCE AND TECHNOLOGY (VAST), 2019, : 57 - 68
  • [10] A biclustering approach for classification with mislabeled data
    de Franca, Fabricio O.
    Coelho, Andre L. V.
    EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (12) : 5065 - 5075