Feature Transformations for Outlier Detection in Classification of Text Documents

被引:0
|
作者
Walkowiak, Tomasz [1 ]
机构
[1] Wroclaw Univ Sci & Technol, Fac Informat & Commun Technol, Wroclaw, Poland
关键词
D O I
10.1007/978-3-031-06746-4_35
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we investigate the influence of feature transformation on the results of outlier detection of text documents. We tested four outlier detection methods: Local Outlier Factor, Extreme Value Machine, Weibull-calibrated SVM, and the Mahalanobis distance. The analyzed text documents are represented by different feature vectors ranging from TF-IDF, through averaged word embedding (two types), to document embedding generated by the BERT network. Experimenting on two different text corpora, we show how a transformation of the feature space (vector representation of documents) influences the outlier detection results.
引用
收藏
页码:361 / 370
页数:10
相关论文
共 50 条
  • [1] Feature selection and text classification for Chinese web documents
    Xu, JC
    Liu, DY
    Hu, M
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 1304 - 1309
  • [2] Feature Extraction in Subject Classification of Text Documents in Polish
    Walkowiak, Tomasz
    Datko, Szymon
    Maciejewski, Henryk
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING (ICAISC 2018), PT II, 2018, 10842 : 445 - 452
  • [3] Hierarchical approach to select feature vectors for classification of text documents
    Kapalavayi, Nagesh
    Murthy, S. N. Jayaram
    Hu, Gongzhu
    2006 IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1-3, 2006, : 1179 - +
  • [4] Classification of text documents
    Li, YH
    Jain, AK
    FOURTEENTH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1 AND 2, 1998, : 1295 - 1297
  • [5] Classification of text documents
    Li, YH
    Jain, AK
    COMPUTER JOURNAL, 1998, 41 (08): : 537 - 546
  • [6] A feature mining based approach for the classification of text documents into disjoint classes
    Sánchez, SN
    Triantaphyllou, E
    Kraft, D
    INFORMATION PROCESSING & MANAGEMENT, 2002, 38 (04) : 583 - 604
  • [7] Variable Global Feature Selection Scheme for automatic classification of text documents
    Agnihotri, Deepak
    Verma, Kesari
    Tripathi, Priyanka
    EXPERT SYSTEMS WITH APPLICATIONS, 2017, 81 : 268 - 281
  • [8] Text classification from unlabeled documents with bootstrapping and feature projection techniques
    Ko, Youngjoong
    Seo, Jungyun
    INFORMATION PROCESSING & MANAGEMENT, 2009, 45 (01) : 70 - 83
  • [9] A Novel Outlier Detection with Feature Selection Enabled Streaming Data Classification
    Rajakumar, R.
    Devi, S. Sathiya
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 35 (02): : 2101 - 2116
  • [10] Outlier detection in classification based on feature-selection-based regression
    Su, Jinxia
    Liu, Qiwen
    Cui, Jingke
    KNOWLEDGE AND INFORMATION SYSTEMS, 2025, 67 (02) : 1399 - 1414