Robust Algorithms for Combining Multiple Term Weighting Vectors for Document Classification

被引:0
|
作者
Kim, Minyoung [1 ]
机构
[1] Seoul Natl Univ Sci & Technol, Dept Elect & IT Media Engn, Seoul, South Korea
关键词
Machine learning; Document/text classification; Term weighting; Optimization;
D O I
10.5391/IJFIS.2016.16.2.81
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Term weighting is a popular technique that effectively weighs the term features to improve accuracy in document classification. While several successful term weighting algorithms have been suggested, none of them appears to perform well consistently across different data domains. In this paper we propose several reasonable methods to combine different term weight vectors to yield a robust document classifier that performs consistently well on diverse datasets. Specifically we suggest two approaches: i) learning a single weight vector that lies in a convex hull of the base vectors while minimizing the class prediction loss, and ii) a mini-max classifier that aims for robustness of the individual weight vectors by minimizing the loss of the worst-performing strategy among the base vectors. We provide efficient solution methods for these optimization problems. The effectiveness and robustness of the proposed approaches are demonstrated on several benchmark document datasets, significantly outperforming the existing term weighting methods.
引用
收藏
页码:81 / 86
页数:6
相关论文
共 50 条
  • [1] Customized term weighting scheme for document classification
    Benjamin, C. M. X.
    Woon, W. L.
    Wong, K. S. D.
    2008 INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION ENGINEERING, VOLS 1-3, 2008, : 294 - 299
  • [2] Comparison of term weighting schemes for document classification
    Jeong, Ho Young
    Shin, Sang Min
    Choi, Yong-Seok
    KOREAN JOURNAL OF APPLIED STATISTICS, 2019, 32 (02) : 265 - 276
  • [3] Identifying Contextual Information in Document Classification using Term Weighting
    Deshmukh, Pratiksha R.
    Phalnikar, Rashmi
    PROCEEDINGS OF THE 2018 IEEE 8TH INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC 2018), 2018, : 72 - 78
  • [4] Identifying Contextual Information in Document Classification using Term Weighting
    Deshmukh, Pratiksha R.
    Phalnikar, Rashmi
    Proceedings of the 8th International Advance Computing Conference, IACC 2018, 2018, : 72 - 78
  • [5] Information-theoretic term weighting schemes for document clustering and classification
    Ke, Weimao
    INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2015, 16 (02) : 145 - 159
  • [6] A New Term Weighting Scheme Based on Class Specific Document Frequency for Document Representation and Classification
    Plansangket, Suthira
    Gan, John Q.
    2015 7TH COMPUTER SCIENCE AND ELECTRONIC ENGINEERING CONFERENCE (CEEC), 2015, : 5 - 8
  • [7] Efficient Feature Selection and Domain Relevance Term Weighting Method for Document Classification
    Khan, Aurangzeb
    Baharudin, Baharum
    Khan, Khairullah
    2010 SECOND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATIONS: ICCEA 2010, PROCEEDINGS, VOL 2, 2010, : 398 - 403
  • [8] ExtMiner:: Combining multiple ranking and clustering algorithms for structured document retrieval
    Nurminen, M
    Honkaranta, A
    Kärkkäinen, T
    SIXTEENTH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2005, : 1036 - 1040
  • [9] A FRAMEWORK FOR ROBUST OBJECT TRACKING BY COMBINING MULTIPLE OPENCV ALGORITHMS
    Ogata, Kohichi
    Tanaka, Koki
    Ikeda, Rinka
    Utaminingrum, Fitri
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2022, 18 (03): : 723 - 738
  • [10] Combining supervised term-weighting metrics for SVM text classification with extended term representation
    Mounia Haddoud
    Aïcha Mokhtari
    Thierry Lecroq
    Saïd Abdeddaïm
    Knowledge and Information Systems, 2016, 49 : 909 - 931