Anomaly detection in discussion forum posts using Global Vectors

被引:0
|
作者
Cichosz, Pawel [1 ]
机构
[1] Warsaw Univ Technol, Inst Comp Sci, Nowowiejska 15-19, PL-00665 Warsaw, Poland
关键词
anomaly detection; text classification; text clustering; word embeddings; VALIDATION; SUPPORT; ONLINE;
D O I
10.1117/12.2501345
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
Anomaly detection can be seen as an unsupervised learning task in which a predictive model created on historical data is used to detect outlying instances in new data. This work addresses possibly promising but relatively uncommon application of anomaly detection to text data. A Polish Internet discussion forum devoted to psychoactive substances received from home-grown plants, such as hashish or marijuana, serves as a text source that is both realistic and possibly interesting on its own, due to potential associations with drug-related crime. Forum posts are preprocessed by stopword removal, spelling correction, stemming, and frequency-based term filtering. The Global Vectors (GloVe) text representation, which is an example of the increasingly popular word embedding approach, is combined with two unsupervised anomaly detection algorithms, based on one-class SVM classification and based on dissimilarity to k-medoids clusters. The cluster dissimilarity approach combined with the GloVe representation outperforms one-class SVM with respect to detection quality and appears a more promising approach to anomaly detection in text data.
引用
收藏
页数:12
相关论文
共 50 条
  • [3] Automatic Classification of Forum Posts: A Finnish Online Health Discussion Forum Case
    Gencoglu, O.
    EMBEC & NBC 2017, 2018, 65 : 169 - 172
  • [4] Mental distress and language use: Linguistic analysis of discussion forum posts
    Lyons, Minna
    Aksayli, Nazli Deniz
    Brewer, Gayle
    COMPUTERS IN HUMAN BEHAVIOR, 2018, 87 : 207 - 211
  • [5] Behaviour Profiling of Reactions in Facebook Posts for Anomaly Detection
    Savyan, P., V
    Bhanu, S. Mary Saira
    2017 NINTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2017, : 220 - 226
  • [6] Behaviour Profiling of Reactions in Facebook Posts for Anomaly Detection
    Department of Computer Science and Engineering, National Institute of Technology, Tiruchirappalli, India
    Int. Conf. Adv. Comput., ICoAC, 1600, (220-226):
  • [7] Automatic Detection of Psychological Distress Indicators in Online Forum Posts
    Saleem, Shirin
    Pacula, Maciej
    Chasin, Rachel
    Kumar, Rohit
    Prasad, Rohit
    Crystal, Michael
    Marx, Brian
    Sloan, Denise
    Vasterling, Jennifer
    Speroff, Theodore
    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
  • [8] Hyperspectral Anomaly Detection Using Enhanced Global Factors
    Paciencia, Todd J.
    Bauer, Kenneth W., Jr.
    AUTOMATIC TARGET RECOGNITION XXVI, 2016, 9844
  • [9] Cooperative Sensor Anomaly Detection Using Global Information
    Zhang, Rui
    Ji, Ping
    Mylaraswamy, Dinkar
    Srivastava, Mani
    Zahedi, Sadaf
    TSINGHUA SCIENCE AND TECHNOLOGY, 2013, 18 (03) : 209 - 219
  • [10] Cooperative Sensor Anomaly Detection Using Global Information
    Rui Zhang
    Ping Ji
    Dinkar Mylaraswamy
    Mani Srivastava
    Sadaf Zahedi
    TsinghuaScienceandTechnology, 2013, 18 (03) : 209 - 219