Feature Transformations for Outlier Detection in Classification of Text Documents

被引:0
|
作者
Walkowiak, Tomasz [1 ]
机构
[1] Wroclaw Univ Sci & Technol, Fac Informat & Commun Technol, Wroclaw, Poland
关键词
D O I
10.1007/978-3-031-06746-4_35
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we investigate the influence of feature transformation on the results of outlier detection of text documents. We tested four outlier detection methods: Local Outlier Factor, Extreme Value Machine, Weibull-calibrated SVM, and the Mahalanobis distance. The analyzed text documents are represented by different feature vectors ranging from TF-IDF, through averaged word embedding (two types), to document embedding generated by the BERT network. Experimenting on two different text corpora, we show how a transformation of the feature space (vector representation of documents) influences the outlier detection results.
引用
收藏
页码:361 / 370
页数:10
相关论文
共 50 条
  • [31] The Research of Text Preprocessing Effect on Text Documents Classification Efficiency
    Kurbatow, Andrew
    2015 INTERNATIONAL CONFERENCE "STABILITY AND CONTROL PROCESSES" IN MEMORY OF V.I. ZUBOV (SCP), 2015, : 653 - 655
  • [32] Prediction and outlier detection in classification problems
    Guan, Leying
    Tibshirani, Robert
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2022, 84 (02) : 524 - 546
  • [33] Outlier Detection: Methods, Models, and Classification
    Boukerche, Azzedine
    Zheng, Lining
    Alfandi, Omar
    ACM COMPUTING SURVEYS, 2020, 53 (03)
  • [34] Automated risk classification and outlier detection
    Iyer, Naresh
    Bonissone, Piero P.
    2007 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN MULTI-CRITERIA DECISION MAKING, 2007, : 272 - +
  • [35] Supervised outlier detection for classification and regression
    Fernandez, Angela
    Bella, Juan
    Dorronsoro, Jose R.
    NEUROCOMPUTING, 2022, 486 : 77 - 92
  • [36] Improving Classification by Outlier Detection and Removal
    Sharma, Pankaj Kumar
    Haleem, Hammad
    Ahmad, Tanvir
    EMERGING ICT FOR BRIDGING THE FUTURE, VOL 2, 2015, 338 : 621 - 628
  • [37] Outlier detection in benchmark classification tasks
    Li, Hongyu
    Niranjan, Mahesan
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 5415 - 5418
  • [38] Hierarchical Method for Automated Text Documents Classification
    Mousa, Mohamed H.
    Khedr, Ayman E.
    Idrees, Amira M.
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2025, 22 (01) : 11 - 19
  • [39] Distributed boosting algorithm for classification of text documents
    Sarnovsky, Martin
    Vronc, Michal
    2014 IEEE 12TH INTERNATIONAL SYMPOSIUM ON APPLIED MACHINE INTELLIGENCE AND INFORMATICS (SAMI), 2014, : 216 - 219
  • [40] Text classification without labeled negative documents
    Fung, GPC
    Yu, JX
    Lu, HJ
    Yu, PS
    ICDE 2005: 21ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2005, : 594 - 605