Regression tree-based active learning

被引：0

作者：

Ashna Jose

João Paulo Almeida de Mendonça

Emilie Devijver

Noël Jakse

Valérie Monbet

Roberta Poloni

机构：

[1] Univ. Grenoble Alpes,

[2] CNRS,undefined

[3] Grenoble INP,undefined

[4] SIMaP,undefined

[5] Univ. Grenoble Alpes,undefined

[6] CNRS,undefined

[7] Grenoble INP,undefined

[8] LIG,undefined

[9] Univ. Rennes and Inria,undefined

[10] CNRS,undefined

[11] IRMAR-UMR 6625,undefined

来源：

Data Mining and Knowledge Discovery | 2024年 / 38卷

关键词：

Active learning; Non-parametric regression; Standard regression trees; Query-based learning;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Machine learning algorithms often require large training sets to perform well, but labeling such large amounts of data is not always feasible, as in many applications, substantial human effort and material cost is needed. Finding effective ways to reduce the size of training sets while maintaining the same performance is then crucial: one wants to choose the best sample of fixed size to be labeled among a given population, aiming at an accurate prediction of the response. This challenge has been studied in detail in classification, but not deeply enough in regression, which is known to be a more difficult task for active learning despite its need in practice. Few model-free active learning methods have been proposed that detect the new samples to be labeled using unlabeled data, but they lack the information of the conditional distribution between the response and the features. In this paper, we propose a standard regression tree-based active learning method for regression that improves significantly upon existing active learning approaches. It provides impressive results for small and large training sets and an appreciably low variance within several runs. We also exploit model-free approaches, and adapt them to our algorithm to utilize maximum information. Through experiments on numerous benchmark datasets, we demonstrate that our framework improves existing methods and is effective in learning a regression model from a very limited labeled dataset, reducing the sample size for a fixed level of performance, even with many features.

引用

页码：420 / 460

页数：40

共 50 条

[1] Regression tree-based active learning
Jose, Ashna
de Mendonca, Joao Paulo Almeida
Devijver, Emilie
Jakse, Noel
Monbet, Valerie
Poloni, Roberta
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2024, 38 (02) : 420 - 460
[2] Inductive learning of tree-based regression models
Torgo, L
[J]. AI COMMUNICATIONS, 2000, 13 (02) : 137 - 138
[3] Tree-based classification and regression Part 3: Tree-based procedures
Gunter, B
[J]. QUALITY PROGRESS, 1998, 31 (02) : 121 - 123
[4] Tree-based regression for a circular response
Lund, UJ
[J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2002, 31 (09) : 1549 - 1560
[5] Tree-Based Ensemble Multi-Task Learning Method for Classification and Regression
Simm, Jaak
Magrans De Abril, Ildefons
Sugiyama, Masashi
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (06) : 1677 - 1681
[6] A Tree-Based Solution to Nonlinear Regression Problem
Demir, Oguzhan
Mohaghegh, Mohammadreza N.
Delibalta, Ibrahim
Kozat, Suleyman S.
[J]. 2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1233 - 1236
[7] Tree-based model checking for logistic regression
Su, Xiaogang
[J]. STATISTICS IN MEDICINE, 2007, 26 (10) : 2154 - 2169
[8] Comparison of tree-based ensemble models for regression
Park, Sangho
Kim, Chanmin
[J]. COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS, 2022, 29 (05) : 561 - 590
[9] A note on the interpretation of tree-based regression models
Gottard, Anna
Vannucci, Giulia
Marchetti, Giovanni Maria
[J]. BIOMETRICAL JOURNAL, 2020, 62 (06) : 1564 - 1573
[10] Polya tree-based nearest neighborhood regression
Haoxin Zhuang
Liqun Diao
Grace Yi
[J]. Statistics and Computing, 2022, 32

← 1 2 3 4 5 →