Anomaly detection in discussion forum posts using Global Vectors

被引:0
|
作者
Cichosz, Pawel [1 ]
机构
[1] Warsaw Univ Technol, Inst Comp Sci, Nowowiejska 15-19, PL-00665 Warsaw, Poland
关键词
anomaly detection; text classification; text clustering; word embeddings; VALIDATION; SUPPORT; ONLINE;
D O I
10.1117/12.2501345
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
Anomaly detection can be seen as an unsupervised learning task in which a predictive model created on historical data is used to detect outlying instances in new data. This work addresses possibly promising but relatively uncommon application of anomaly detection to text data. A Polish Internet discussion forum devoted to psychoactive substances received from home-grown plants, such as hashish or marijuana, serves as a text source that is both realistic and possibly interesting on its own, due to potential associations with drug-related crime. Forum posts are preprocessed by stopword removal, spelling correction, stemming, and frequency-based term filtering. The Global Vectors (GloVe) text representation, which is an example of the increasingly popular word embedding approach, is combined with two unsupervised anomaly detection algorithms, based on one-class SVM classification and based on dissimilarity to k-medoids clusters. The cluster dissimilarity approach combined with the GloVe representation outperforms one-class SVM with respect to detection quality and appears a more promising approach to anomaly detection in text data.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Global anomaly crowd behavior detection using crowd behavior feature vector
    Yin, Yong
    Liu, Qiannan
    Mao, Shibiao
    International Journal of Smart Home, 2015, 9 (12): : 149 - 160
  • [32] Climate change in the Himalaya: Views of Mountain Forum members voiced on the Mountain Forum global discussion list
    Sherchan, U
    Sharma, P
    MOUNTAIN RESEARCH AND DEVELOPMENT, 2005, 25 (04) : 384 - 385
  • [33] Using Content-Based Features for Author Profiling of Vietnamese Forum Posts
    Duc Tran Duong
    Son Bao Pham
    Hanh Tan
    RECENT DEVELOPMENTS IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, 2016, 642 : 287 - 296
  • [34] Improving Learning Experience: Detection of Team Roles in a Discussion Forum
    Bermejo, Miren
    Sanchez, Ana
    PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON E-LEARNING, 2009, : 52 - +
  • [35] Global Information Guided Video Anomaly Detection
    Lv, Hui
    Xu, Chunyan
    Cui, Zhen
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4679 - 4683
  • [36] Splog Detection Using Structural Similarity between Posts and URL Biasedness in Posts
    Kim, Soo-Cheol
    Lee, Su-Won
    Sung, Kyoung-Jun
    Kim, Sung Kwon
    JOURNAL OF INTERNET TECHNOLOGY, 2012, 13 (05): : 767 - 772
  • [37] An Intelligent Anomaly Detection Method for Rotating Machinery Based on Vibration Vectors
    Hu, Di
    Zhang, Chen
    Yang, Tao
    Chen, Gang
    IEEE SENSORS JOURNAL, 2022, 22 (14) : 14294 - 14305
  • [38] Anomaly Detection Using Ensembles
    Shoemaker, Larry
    Hall, Lawrence O.
    MULTIPLE CLASSIFIER SYSTEMS, 2011, 6713 : 6 - 15
  • [39] Anomaly detection using topology
    Basener, Bill
    Ientilucci, Emmett J.
    Messinger, David W.
    ALGORITHMS AND TECHNOLOGIES FOR MULTISPECTRAL, HYPERSPECTRAL, AND ULTRASPECTRAL IMAGERY XIII, 2007, 6565
  • [40] Improving Mental Health using Machine Learning to Assist Humans in the Moderation of Forum Posts
    Wang, Dong
    Weeds, Julie
    Comley, Ian
    PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 5: HEALTHINF, 2020, : 187 - 197