Is it worth it? Budget-related evaluation metrics for model selection

被引:0
|
作者
Klubicka, Filip [1 ]
Salton, Giancarlo D. [1 ]
Kelleher, John D. [1 ]
机构
[1] Dublin Inst Technol, Sch Comp, Dublin, Ireland
关键词
model evaluation; gain; budget; linguistic resource creation; idiom identification; idiom dictionary; F-score;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Projects that set out to create a linguistic resource often do so by using a machine learning model that pre-annotates or filters the content that goes through to a human annotator, before going into the final version of the resource. However, available budgets are often limited, and the amount of data that is available exceeds the amount of annotation that can be done. Thus, in order to optimize the benefit from the invested human work, we argue that the decision on which predictive model one should employ depends not only on generalized evaluation metrics, such as accuracy and F-score, but also on the gain metric. The rationale is that, the model with the highest F-score may not necessarily have the best separation and sequencing of predicted classes, thus leading to the investment of more time and/or money on annotating false positives, yielding zero improvement of the linguistic resource. We exemplify our point with a case study, using real data from a task of building a verb-noun idiom dictionary. We show that in our scenario, given the choice of three systems with varying F-scores, the system with the highest F-score does not yield the highest profits. In other words, we show that the cost-benefit trade off can be more favorable if a system with a lower F-score is employed.
引用
收藏
页码:2014 / 2021
页数:8
相关论文
共 50 条
  • [1] THE IMPACT OF BUDGET-RELATED BEHAVIORS ON NURSE MANAGERS NEED SATISFACTION
    HALPIN, A
    [J]. JOURNAL OF NURSING ADMINISTRATION, 1986, 16 (06): : 23 - 23
  • [2] MORE ON MEASURING BUDGET-RELATED RENT-SEEKING - A COMMENT
    KATZ, E
    ROSENBERG, J
    [J]. PUBLIC CHOICE, 1994, 78 (02) : 187 - 191
  • [3] AN EMPIRICAL-STUDY OF BUDGET-RELATED PREDICTIONS OF CORPORATE-EXECUTIVES
    ASHTON, AH
    [J]. JOURNAL OF ACCOUNTING RESEARCH, 1982, 20 (02) : 440 - 449
  • [4] BUDGET-RELATED BEHAVIOR IN PUBLIC-SECTOR ORGANIZATIONS - SOME EMPIRICAL-EVIDENCE
    WILLIAMS, JJ
    MACINTOSH, NB
    MOORE, JC
    [J]. ACCOUNTING ORGANIZATIONS AND SOCIETY, 1990, 15 (03) : 221 - 246
  • [5] Recommended temperature metrics for carbon budget estimates, model evaluation and climate policy
    Tokarska, Katarzyna B.
    Schleussner, Carl-Friedrich
    Rogelj, Joeri
    Stolpe, Martin B.
    Matthews, H. Damon
    Pfleiderer, Peter
    Gillett, Nathan P.
    [J]. NATURE GEOSCIENCE, 2019, 12 (12) : 964 - +
  • [6] Recommended temperature metrics for carbon budget estimates, model evaluation and climate policy
    Katarzyna B. Tokarska
    Carl-Friedrich Schleussner
    Joeri Rogelj
    Martin B. Stolpe
    H. Damon Matthews
    Peter Pfleiderer
    Nathan P. Gillett
    [J]. Nature Geoscience, 2019, 12 : 964 - 971
  • [7] Evaluation of Prediction-Oriented Model Selection Metrics for Extended Redundancy Analysis
    Kim, Sunmee
    Hwang, Heungsun
    [J]. FRONTIERS IN PSYCHOLOGY, 2022, 13
  • [8] Cloud Manufacturing Service Selection Model Based on Adaptive Variable Evaluation Metrics
    Cui, Jin
    Ren, Lei
    Zhang, Lin
    [J]. THEORY, METHODOLOGY, TOOLS AND APPLICATIONS FOR MODELING AND SIMULATION OF COMPLEX SYSTEMS, PT III, 2016, 645 : 13 - 19
  • [9] Novel metrics for growth model selection
    Grigsby M.R.
    Di J.
    Leroux A.
    Zipunnikov V.
    Xiao L.
    Crainiceanu C.
    Checkley W.
    [J]. Emerging Themes in Epidemiology, 15 (1):
  • [10] Selection and evaluation of air traffic complexity metrics
    Gianazza, David
    Guittet, Kevin
    [J]. 2006 IEEE/AIAA 25TH DIGITAL AVIONICS SYSTEMS CONFERENCE, VOLS 1- 3, 2006, : 254 - 265