Anomaly detection in discussion forum posts using Global Vectors

被引:0
|
作者
Cichosz, Pawel [1 ]
机构
[1] Warsaw Univ Technol, Inst Comp Sci, Nowowiejska 15-19, PL-00665 Warsaw, Poland
关键词
anomaly detection; text classification; text clustering; word embeddings; VALIDATION; SUPPORT; ONLINE;
D O I
10.1117/12.2501345
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
Anomaly detection can be seen as an unsupervised learning task in which a predictive model created on historical data is used to detect outlying instances in new data. This work addresses possibly promising but relatively uncommon application of anomaly detection to text data. A Polish Internet discussion forum devoted to psychoactive substances received from home-grown plants, such as hashish or marijuana, serves as a text source that is both realistic and possibly interesting on its own, due to potential associations with drug-related crime. Forum posts are preprocessed by stopword removal, spelling correction, stemming, and frequency-based term filtering. The Global Vectors (GloVe) text representation, which is an example of the increasingly popular word embedding approach, is combined with two unsupervised anomaly detection algorithms, based on one-class SVM classification and based on dissimilarity to k-medoids clusters. The cluster dissimilarity approach combined with the GloVe representation outperforms one-class SVM with respect to detection quality and appears a more promising approach to anomaly detection in text data.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Anomaly Detection Based on the Global-Local Anomaly Score for Trajectory Data
    Li, Chengcheng
    Xu, Qing
    Peng, Cheng
    Guo, Yuejun
    NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V, 2019, 1143 : 275 - 285
  • [42] Detection of Network Intrusions Using Anomaly Detection
    Macedo, Andre Manuel
    Magalhaes, Joao Paulo
    2023 20TH ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, AICCSA, 2023,
  • [43] COMMUNITY DETECTION IN A WEB DISCUSSION FORUM DURING SOCIAL UNREST EVENTS
    Shen, Ao
    Chow, Kam-Pui
    ADVANCES IN DIGITAL FORENSICS XVIII, 2022, 653 : 169 - 185
  • [44] Analyzing Academic Discussion Forum Data with Topic Detection and Data Visualization
    Wong, Gary K. W.
    Li, Simon Y. K.
    Elby
    Wong, W. Y.
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON TEACHING, ASSESSMENT, AND LEARNING FOR ENGINEERING (TALE), 2016, : 109 - 115
  • [45] Malware Detection using Anomaly Detection Algorithms
    Buriro, Attaullah
    Rafi, Arslan
    Yaqub, Muhammad Azfar
    Luccio, Flaminia
    2024 FIFTEENTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS, ICUFN 2024, 2024, : 330 - 335
  • [46] USING THE DISCUSSION FORUM IN VIRTUAL PLATFORMS: NEW TOOLS FOR COMUNICATION
    Alos Villanueva, Patricia
    Lago Urbano, Rocio
    EDULEARN12: 4TH INTERNATIONAL CONFERENCE ON EDUCATION AND NEW LEARNING TECHNOLOGIES, 2012, : 744 - 749
  • [47] Effectiveness of Using Online Discussion Forum for Case Study Analysis
    Seethamraju, Ravi k
    EDUCATION RESEARCH INTERNATIONAL, 2014, 2014
  • [48] Discourse-based learning using a multimedia discussion forum
    Tay, MH
    Hooi, CM
    Chee, YS
    INTERNATIONAL CONFERENCE ON COMPUTERS IN EDUCATION, VOLS I AND II, PROCEEDINGS, 2002, : 293 - 294
  • [49] Counterfeit Anomaly Using Generative Adversarial Network for Anomaly Detection
    Shen, Haocheng
    Chen, Jingkun
    Wang, Ruixuan
    Zhang, Jianguo
    IEEE ACCESS, 2020, 8 (08): : 133051 - 133062
  • [50] Anomaly Detection using Network Metadata
    Mutmbak, Khaled
    Alotaibi, Sultan
    Alharbi, Khalid
    Albalawi, Umar
    Younes, Osama
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (05) : 809 - 814