Early Forecasting of Text Classification Accuracy and F-Measure with Active Learning

被引:1
|
作者
Orth, Thomas [1 ]
Bloodgood, Michael [1 ]
机构
[1] Coll New Jersey, Dept Comp Sci, Ewing, NJ 08628 USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/ICSC.2020.00018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When creating text classification systems, one of the major bottlenecks is the annotation of training data. Active learning has been proposed to address this bottleneck using stopping methods to minimize the cost of data annotation. An important capability for improving the utility of stopping methods is to effectively forecast the performance of the text classification models. Forecasting can be done through the use of logarithmic models regressed on some portion of the data as learning is progressing. A critical unexplored question is what portion of the data is needed for accurate forecasting. There is a tension, where it is desirable to use less data so that the forecast can be made earlier, which is more useful, versus it being desirable to use more data, so that the forecast can be more accurate. We find that when using active learning it is even more important to generate forecasts earlier so as to make them more useful and not waste annotation effort. We investigate the difference in forecasting difficulty when using accuracy and F-measure as the text classification system performance metrics and we find that F-measure is more difficult to forecast. We conduct experiments on seven text classification datasets in different semantic domains with different characteristics and with three different base machine learning algorithms. We find that forecasting is easiest for decision tree learning, moderate for Support Vector Machines, and most difficult for neural networks.
引用
收藏
页码:77 / 84
页数:8
相关论文
共 50 条
  • [1] Stopping Active Learning based on Predicted Change of F Measure for Text Classification
    Altschuler, Michael
    Bloodgood, Michael
    [J]. 2019 13TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2019, : 47 - 54
  • [2] An adaptation of a F-measure for automatic text summarization by extraction
    Boudia, Mohamed Amine
    Hamou, Reda Mohamed
    Amine, Abdelmalek
    Lokbani, Ahmed Chaouki
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2020, 23 (03): : 2389 - 2398
  • [3] An adaptation of a F-measure for automatic text summarization by extraction
    Mohamed Amine Boudia
    Reda Mohamed Hamou
    Abdelmalek Amine
    Ahmed Chaouki Lokbani
    [J]. Cluster Computing, 2020, 23 : 2389 - 2398
  • [4] Experimental investigating the F-measure as similarity measure for automatic text summarization
    Alguliev, Rasim M.
    Aliguliyev, Ramiz M.
    [J]. APPLIED AND COMPUTATIONAL MATHEMATICS, 2007, 6 (02): : 278 - 287
  • [5] Regularized F-Measure Maximization for Feature Selection and Classification
    Liu, Zhenqiu
    Tan, Ming
    Jiang, Feng
    [J]. JOURNAL OF BIOMEDICINE AND BIOTECHNOLOGY, 2009,
  • [6] Common Problems With the Usage of F-Measure and Accuracy Metrics in Medical Research
    Lavazza, Luigi
    Morasca, Sandro
    [J]. IEEE ACCESS, 2023, 11 : 51515 - 51526
  • [7] Adjusted F-measure and kernel scaling for imbalanced data learning
    Maratea, Antonio
    Petrosino, Alfredo
    Manzo, Mario
    [J]. INFORMATION SCIENCES, 2014, 257 : 331 - 341
  • [8] Cost-Sensitive Hypergraph Learning With F-Measure Optimization
    Wang, Nan
    Liang, Ruozhou
    Zhao, Xibin
    Gao, Yue
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (05) : 2767 - 2778
  • [9] From Cost-Sensitive Classification to Tight F-measure Bounds
    Bascol, Kevin
    Emonet, Remi
    Fromont, Elisa
    Habrard, Amaury
    Metzler, Guillaume
    Sebban, Marc
    [J]. 22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [10] Exemplifying the Effects of Distance Metrics on Clustering Techniques: F-measure, Accuracy and Efficiency
    Nizam, Tasleem
    Hassan, Sayed Imtiyaz
    [J]. PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM-2020), 2019, : 39 - 44