Anomaly detection in discussion forum posts using Global Vectors

被引:0
|
作者
Cichosz, Pawel [1 ]
机构
[1] Warsaw Univ Technol, Inst Comp Sci, Nowowiejska 15-19, PL-00665 Warsaw, Poland
关键词
anomaly detection; text classification; text clustering; word embeddings; VALIDATION; SUPPORT; ONLINE;
D O I
10.1117/12.2501345
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
Anomaly detection can be seen as an unsupervised learning task in which a predictive model created on historical data is used to detect outlying instances in new data. This work addresses possibly promising but relatively uncommon application of anomaly detection to text data. A Polish Internet discussion forum devoted to psychoactive substances received from home-grown plants, such as hashish or marijuana, serves as a text source that is both realistic and possibly interesting on its own, due to potential associations with drug-related crime. Forum posts are preprocessed by stopword removal, spelling correction, stemming, and frequency-based term filtering. The Global Vectors (GloVe) text representation, which is an example of the increasingly popular word embedding approach, is combined with two unsupervised anomaly detection algorithms, based on one-class SVM classification and based on dissimilarity to k-medoids clusters. The cluster dissimilarity approach combined with the GloVe representation outperforms one-class SVM with respect to detection quality and appears a more promising approach to anomaly detection in text data.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Untangling chaos in discussion forums: A temporal analysis of topic-relevant forum posts in MOOCs
    Yang, Bokai
    Tang, Hengtao
    Hao, Ling
    Rose, John R.
    COMPUTERS & EDUCATION, 2022, 178
  • [22] Leveraging Stock Discussion Forum Posts for Stock Price Predictions: Focusing on the Secondary Battery Sector
    You, Jisoo
    Jang, Haryeom
    Kang, Minsuk
    Yang, Sung-Byung
    Yoon, Sang-Hyeak
    IEEE ACCESS, 2024, 12 : 153537 - 153549
  • [23] GLAD: GLOBAL AND LOCAL ANOMALY DETECTION
    Nie, Lihai
    Zhao, Laiping
    Li, Keqiu
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [24] Discussion forum: Global health D&I research
    Neta, Gila
    Vinson, Cynthia
    IMPLEMENTATION SCIENCE, 2016, 11
  • [25] Use of global context for handling noisy names in discussion texts of a homeopathy discussion forum
    Majumder, Mukta
    Saha, Sujan Kumar
    KNOWLEDGE MANAGEMENT & E-LEARNING-AN INTERNATIONAL JOURNAL, 2014, 6 (01) : 18 - 29
  • [26] Weighted word embeddings and clustering-based identification of question topics in MOOC discussion forum posts
    Onan, Aytug
    Tocoglu, Mansur Alp
    COMPUTER APPLICATIONS IN ENGINEERING EDUCATION, 2021, 29 (04) : 675 - 689
  • [27] Learning Together for Mastery by Using a Discussion Forum
    Amano, Kei
    Tsuzuku, Shigeki
    Suzuki, Katsuaki
    Hiraoka, Naoshi
    2019 INTERNATIONAL SYMPOSIUM ON EDUCATIONAL TECHNOLOGY (ISET 2019), 2019, : 165 - 169
  • [28] Detection of Cases of Noncompliance to Drug Treatment in Patient Forum Posts: Topic Model Approach
    Abdellaoui, Redhouane
    Foulquie, Pierre
    Texier, Nathalie
    Faviez, Carole
    Burgun, Anita
    Schuck, Stephane
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2018, 20 (03)
  • [29] Removing Noise (Opinion Messages) for Fake News Detection in Discussion Forum Using BERT Model
    Ip, Cheuk Yu
    Li, Fu Kay Frankie
    Lam, Yi Anson
    Yiu, Siu Ming
    DIGITAL FORENSICS AND CYBER CRIME, PT 1, ICDF2C 2023, 2024, 570 : 78 - 95
  • [30] Anomaly Detection in Crowded Scenarios Using Local and Global Gaussian Mixture Models
    Tome, Adrian
    Salgado, Luis
    ADVANCED CONCEPTS FOR INTELLIGENT VISION SYSTEMS (ACIVS 2017), 2017, 10617 : 363 - 374