Learning from crowds with active learning and self-healing

被引:3
|
作者
Shu, Zhenyu [1 ]
Sheng, Victor S. [2 ,3 ]
Li, Jingjing [4 ]
机构
[1] South Cent Univ Nationalities, Coll Elect & Informat Engn, 182 Minyuan Rd, Wuhan, Hubei, Peoples R China
[2] Suzhou Univ Sci & Technol, Sch Elect & Informat Engn, Suzhou, Peoples R China
[3] Univ Cent Arkansas, Dept Comp Sci, 201 Donaghey Ave, Conway, AR USA
[4] Hubei Univ Econ, Coll Elect & Informat Engn, 8 Yangchahu Rd, Wuhan, Hubei, Peoples R China
来源
NEURAL COMPUTING & APPLICATIONS | 2018年 / 30卷 / 09期
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
Crowdsourcing; Active learning; Supervised classification; Machine learning; NAIVE BAYES; SEGMENTATION;
D O I
10.1007/s00521-017-2878-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the development of crowdsourcing, data acquisition for supervised learning from annotators all over the world becomes simple and economical. To improve accuracy, it is nature to obtain multiple noisy labels (i.e., a multiple label set) for each example from the crowd. Then, consensus algorithms can infer the estimated ground truth from the multiple label set for each example. The estimated ground truth is also called an integrated label, which could be a noise. That is, a dataset constructed via integrating the multiple noisy labels for each example in a crowdsourcing dataset (called an integrated dataset) still contains noises. In order to further improve the data quality of an integrated dataset, so that to improve the performance of a model learned from the integrated dataset, this paper proposes a framework that integrates active learning with the self-healing of a model together. With active learning, a limited number of examples from the integrated dataset, which are most likely noises, are selected for the oracle to correct; with the self-healing of a model, the data quality of the integrated dataset can be also improved automatically. From our experimental results on eight simulated crowdsourcing datasets with three popular consensus algorithms, we draw some conclusions as follows. (1) Our proposed framework does improve the performance of a model learned from the integrated dataset. (2) The simple active learning selection strategy based on uncertainty estimation can identify noises in the integrated dataset. (3) Self-healing is efficient and effective to improve the data quality of the integrated dataset, so that it improves the accuracy of a model learned from the integrated dataset. We further investigate our proposed framework on a real-world crowdsourcing dataset collected from Amazon Mechanical Turk, and the above conclusions are sustained.
引用
收藏
页码:2883 / 2894
页数:12
相关论文
共 50 条
  • [1] Learning from crowds with active learning and self-healing
    Zhenyu Shu
    Victor S. Sheng
    Jingjing Li
    [J]. Neural Computing and Applications, 2018, 30 : 2883 - 2894
  • [2] Self-Taught Active Learning from Crowds
    Fang, Meng
    Zhu, Xingquan
    Li, Bin
    Ding, Wei
    Wu, Xindong
    [J]. 12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, : 858 - 863
  • [3] Active Learning for Text Mining from Crowds
    Shao, Hao
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE: FROM THEORY TO PRACTICE (IEA/AIE 2017), PT II, 2017, 10351 : 409 - 418
  • [4] Active Learning from Crowds with Unsure Option
    Zhong, Jinhong
    Tang, Ke
    Zhou, Zhi-Hua
    [J]. PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 1061 - 1067
  • [5] Diagnosis Based on Machine Learning for LTE Self-Healing
    Liu, Xuewen
    Chuai, Gang
    Gao, Weidong
    Ren, Yifang
    Zhang, Kaisa
    [J]. COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, 2019, 463 : 2096 - 2105
  • [6] Learning From Crowds
    Raykar, Vikas C.
    Yu, Shipeng
    Zhao, Linda H.
    Valadez, Gerardo Hermosillo
    Florin, Charles
    Bogoni, Luca
    Moy, Linda
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2010, 11 : 1297 - 1322
  • [7] Dynamic data-driven learning for self-healing avionics
    Imai, Shigeru
    Chen, Sida
    Zhu, Wennan
    Varela, Carlos A.
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 1): : 2187 - 2210
  • [8] A Reinforcement Learning based solution for Self-Healing in LTE networks
    Moysen, Jessica
    Giupponi, Lorenza
    [J]. 2014 IEEE 80TH VEHICULAR TECHNOLOGY CONFERENCE (VTC FALL), 2014,
  • [9] Dynamic data-driven learning for self-healing avionics
    Shigeru Imai
    Sida Chen
    Wennan Zhu
    Carlos A. Varela
    [J]. Cluster Computing, 2019, 22 : 2187 - 2210
  • [10] Learning Recovery Strategies for Dynamic Self-healing in Reactive Systems
    Sanabria, Mateo
    Dusparic, Ivana
    Cardozo, Nicolas
    [J]. PROCEEDINGS OF THE 2024 IEEE/ACM 19TH SYMPOSIUM ON SOFTWARE ENGINEERING FOR ADAPTIVE AND SELF-MANAGING SYSTEMS, SEAMS 2024, 2024, : 133 - 142