共 26 条
Reducing the user labeling effort in effective high recall tasks by fine-tuning active learning
被引:3
|作者:
Dal Bianco, Guilherme
[1
]
Duarte, Denio
[1
]
Goncalves, Marcos Andre
[2
]
机构:
[1] Univ Fed Fronteira Sul, Campus Chapeco, Chapeco, Brazil
[2] Univ Fed Minas Gerais, Dept Ciencia Comp, Belo Horizonte, Brazil
关键词:
Information retrieval;
Hire;
Active learning;
SSAR;
Labeling process;
Supervised classifier;
SELECTION;
D O I:
10.1007/s10844-022-00772-y
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
High recall Information REtrieval (HIRE) aims at identifying only and (almost) all relevant documents for a given query. HIRE is paramount in applications such as systematic literature review, medicine, legal jurisprudence, among others. To address the HIRE goals, active learning methods have proven valuable in determining informative and non-redundant documents to reduce user effort for manual labeling. We propose a new active learning framework for the HIRE task. REVEAL-HIRE selects a very reduced set of documents to be labeled, significantly mitigating the user's effort. The proposed approach selects the most representative documents by exploiting a novel, specifically designed active learning strategy for HIRE, called REVEAL (RelEVant rulE-based Active Learning). REVEAL aims at selecting the maximum number of relevant documents for a given query based on discriminative rule-based patterns and a penalization factor. The method is applied to the top-ranked documents to choose the most informative ones to be labeled, a hard task due to data skewness - most documents are irrelevant for a given query. The enhanced active learning process is repeated incrementally until a stopping point is achieved, using REVEAL to identify the point in the process when relevant documents should stop to be sampled. Experimental results in several standard benchmark datasets (e.g. 20-Newsgroups, Trec Total Recall, and CLEF eHealth) demonstrate that REVEAL-HIRE can reduce the user labeling effort up to 3 times (320% of reduction) in comparison with state-of-the-art baselines while keeping the effectiveness at the highest levels.
引用
收藏
页码:453 / 472
页数:20
相关论文