Confidence Measure for Czech Document Classification

被引:3
|
作者
Kral, Pavel [1 ,2 ]
Lenc, Ladislav [1 ,2 ]
机构
[1] Univ W Bohemia, Fac Sci Appl, Dept Comp Sci & Engn, Plzen 30614, Czech Republic
[2] Univ W Bohemia, Fac Sci Appl, NTIS, Plzen 30614, Czech Republic
关键词
TEXT CLASSIFICATION; RECOGNITION; FEATURES;
D O I
10.1007/978-3-319-18117-2_39
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper deals with automatic document classification in the context of a real application for the Czech News Agency (CTK). The accuracy our classifier is high, however it is still important to improve the classification results. The main goal of this paper is thus to propose novel confidence measure approaches in order to detect and remove incorrectly classified samples. Two proposed methods are based on the posterior class probability and the third one is a supervised approach which uses another classifier to determine if the result is correct. The methods are evaluated on a Czech newspaper corpus. We experimentally show that it is beneficial to integrate the novel approaches into the document classification task because they significantly improve the classification accuracy.
引用
收藏
页码:525 / 534
页数:10
相关论文
共 50 条
  • [1] Multi-label Document Classification in Czech
    Hrala, Michal
    Kral, Pavel
    [J]. TEXT, SPEECH, AND DIALOGUE, TSD 2013, 2013, 8082 : 343 - 351
  • [2] Distance to second cluster as a measure of classification confidence
    Mitchell, Scott W.
    Remmel, Tarmo K.
    Csillag, Ferenc
    Wulder, Michael A.
    [J]. REMOTE SENSING OF ENVIRONMENT, 2008, 112 (05) : 2615 - 2626
  • [3] A SEGMENT-LEVEL CONFIDENCE MEASURE FOR SPOKEN DOCUMENT RETRIEVAL
    Senay, Gregory
    Linares, Georges
    Lecouteux, Benjamin
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5548 - 5551
  • [4] Named Entities as New Features for Czech Document Classification
    Kral, Pavel
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2014, PART II, 2014, 8404 : 417 - 427
  • [5] A conflict-based confidence measure for associative classification
    Vateekul, Peerapon
    Shyu, Mei-Ling
    [J]. PROCEEDINGS OF THE 2008 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2008, : 256 - 261
  • [6] A New Similarity Measure for Document Classification and Text Mining
    Eminagaoglu, Mete
    Goksen, Yilmaz
    [J]. ECONOMIES OF THE BALKAN AND EASTERN EUROPEAN COUNTRIES, 2020, : 353 - 366
  • [7] Information filtering in chinese document images based on templates matching and confidence measure
    Chen, JW
    Xu, WR
    Guo, J
    [J]. 2004 7TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS 1-3, 2004, : 1376 - 1379
  • [8] Novel Unsupervised Features for Czech Multi-label Document Classification
    Brychcin, Tomas
    Kral, Pavel
    [J]. HUMAN-INSPIRED COMPUTING AND ITS APPLICATIONS, PT I, 2014, 8856 : 70 - 79
  • [9] Improving Multi-label Document Classification of Czech News Articles
    Lehecka, Jan
    Svec, Jan
    [J]. TEXT, SPEECH, AND DIALOGUE (TSD 2015), 2015, 9302 : 307 - 315
  • [10] Deep Neural Networks for Czech Multi-label Document Classification
    Lenc, Ladislav
    Kral, Pavel
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, (CICLING 2016), PT II, 2018, 9624 : 460 - 471