Improving Active Learning Performance through the Use of Data Augmentation

被引:2
|
作者
Fonseca, Joao [1 ]
Bacao, Fernando [1 ]
机构
[1] Univ Nova Lisboa, NOVA Informat Management Sch, Lisbon, Portugal
关键词
CLASSIFICATION; SELECTION; MACHINE;
D O I
10.1155/2023/7941878
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Active learning (AL) is a well-known technique to optimize data usage in training, through the interactive selection of unlabeled observations, out of a large pool of unlabeled data, to be labeled by a supervisor. Its focus is to find the unlabeled observations that, once labeled, will maximize the informativeness of the training dataset, therefore reducing data-related costs. The literature describes several methods to improve the effectiveness of this process. Nonetheless, there is a paucity of research developed around the application of artificial data sources in AL, especially outside image classification or NLP. This paper proposes a new AL framework, which relies on the effective use of artificial data. It may be used with any classifier, generation mechanism, and data type and can be integrated with multiple other state-of-the-art AL contributions. This combination is expected to increase the ML classifier's performance and reduce both the supervisor's involvement and the amount of required labeled data at the expense of a marginal increase in computational time. The proposed method introduces a hyperparameter optimization component to improve the generation of artificial instances during the AL process as well as an uncertainty-based data generation mechanism. We compare the proposed method to the standard framework and an oversampling-based active learning method for more informed data generation in an AL context. The models' performance was tested using four different classifiers, two AL-specific performance metrics, and three classification performance metrics over 15 different datasets. We demonstrated that the proposed framework, using data augmentation, significantly improved the performance of AL, both in terms of classification performance and data selection efficiency (all the codes and preprocessed data developed for this study are available at ).
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Selective Data Augmentation for Improving the Performance of Offline Reinforcement Learning
    Han, Jungwoo
    Kim, Jinwhan
    [J]. 2022 22ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2022), 2022, : 222 - 226
  • [2] Improving the Performance of Just-In-Time Learning-Based Soft Sensor Through Data Augmentation
    Jiang, Xiaoyu
    Ge, Zhiqiang
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2022, 69 (12) : 13716 - 13726
  • [3] ACAMDA: Improving Data Efficiency in Reinforcement Learning Through Guided Counterfactual Data Augmentation
    Sun, Yuewen
    Wang, Erli
    Huang, Biwei
    Lu, Chaochao
    Feng, Lu
    Sun, Changyin
    Zhang, Kun
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 15193 - 15201
  • [4] Improving Organizational Performance Through the Use of Big Data
    Ghasemaghaei, Maryam
    [J]. JOURNAL OF COMPUTER INFORMATION SYSTEMS, 2020, 60 (05) : 395 - 408
  • [5] Improving Deep Learning with Generic Data Augmentation
    Taylor, Luke
    Nitschke, Geoff
    [J]. 2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2018, : 1542 - 1547
  • [6] Improving Deep Learning Parkinson's Disease Detection Through Data Augmentation Training
    Taleb, Catherine
    Likforman-Sulem, Laurence
    Mokbel, Chafic
    [J]. PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2020, 1144 : 79 - 93
  • [7] Improving Deep Learning for Maritime Remote Sensing through Data Augmentation and Latent Space
    Sobien, Daniel
    Higgins, Erik
    Krometis, Justin
    Kauffman, Justin
    Freeman, Laura
    [J]. MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2022, 4 (03): : 665 - 687
  • [8] Improving the Transferability of Adversarial Samples through Automatically Learning Augmentation Strategies from Data
    Xu, Ru-Zhi
    Lyu, Chang-Ran
    [J]. International Journal of Network Security, 2023, 25 (06) : 983 - 991
  • [9] Improving the Performance of Fog Computing through the use of Data Locality
    Steffenel, Luiz Angelo
    [J]. 2018 30TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2018), 2018, : 217 - 224
  • [10] Improving Intrusion Detection Through Training Data Augmentation
    Otokwala, Uneneibotejit
    Petrovski, Andrei
    Kalutarage, Harsha
    [J]. 2021 14TH INTERNATIONAL CONFERENCE ON SECURITY OF INFORMATION AND NETWORKS (SIN 2021), 2021,