Unsupervised Anomaly Detection in Sequential Process Data

被引:1
|
作者
Bulut, Okan [1 ,3 ]
Gorgun, Guher [2 ]
He, Surina [2 ]
机构
[1] Univ Alberta, Ctr Res Appl Measurement & Evaluat, Edmonton, AB, Canada
[2] Univ Alberta, Measurement Evaluat & Data Sci, Edmonton, AB, Canada
[3] Univ Alberta, Fac Educ, Ctr Res Appl Measurement & Evaluat, 6-110 Educ Ctr North,11210 87 Ave NW, Edmonton, AB T6G 2G5, Canada
来源
关键词
action sequences; technology-rich items; PIAAC; anomaly detection; BERT; Isolation Forest; SEQUENCES;
D O I
10.1027/2151-2604/a000558
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
In this study, we present three types of unsupervised anomaly detection to identify anomalous test-takers based on their action sequences in problem-solving tasks. The first method relies on the use of the Isolation Forest algorithm to detect anomalous test-takers based on raw action sequences extracted from process data. The second method transforms raw action sequences into contextual embeddings using the Bidirectional Encoder Representations from Transformers (BERT) model and then applies the Isolation Forest algorithm to detect anomalous test-takers. The third method follows the same procedure as the second method, but it includes an intermediary step of dimensionality reduction for the contextual embeddings before applying the Isolation Forest algorithm for detecting anomalous cases. To compare the outcomes of the three methods, we analyze the log files from test-takers in the US sample (n = 2,021) who completed the problem-solving in technology-rich environments (PSTRE) section of the Programme for the International Assessment of Adult Competencies (PIAAC) 2012 assessment. The results indicated that different groups of test-takers were flagged as anomalous depending on the representation (raw action sequences vs. contextual embeddings) and dimensionality of action sequences. Also, when the contextual embeddings were used, a larger number of test-takers were flagged by the Isolation Forest algorithm, indicating the sensitivity of this algorithm to the dimensionality of input data.
引用
收藏
页码:74 / 94
页数:21
相关论文
共 50 条
  • [1] Sequential Ensemble Method for Unsupervised Anomaly Detection
    Huy Van Nguyen
    Trung Thanh Nguyen
    Quang Uy Nguyen
    2017 9TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2017), 2017, : 71 - 76
  • [2] Anomaly detection in injection molding process data based on unsupervised learning
    Schiffers, Reinhard
    Morik, Katharina
    Struchtrup, Alexander Schulze
    Honysz, Philipp-Jan
    Wortberg, Johannes
    Zeitschrift Kunststofftechnik/Journal of Plastics Technology, 2019, 2019 (05): : 301 - 347
  • [3] Unsupervised Anomaly Detection in Transactional Data
    Bouguessa, Mohamed
    2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1, 2012, : 526 - 531
  • [4] Anomaly detection in transactional sequential data
    Zhang, Jingwei
    Lin, Yuming
    Zhang, Huibing
    Yang, Qing
    Information Technology Journal, 2012, 11 (07) : 782 - 787
  • [5] Unsupervised Anomaly Detection in Data Quality Control
    Poon, Lex
    Farshidi, Siamak
    Li, Na
    Zhao, Zhiming
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 2327 - 2336
  • [6] Unsupervised Anomaly Detection on Temporal Multiway Data
    Duc Nguyen
    Phuoc Nguyen
    Kien Do
    Rana, Santu
    Gupta, Sunil
    Truyen Tran
    2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2020, : 1059 - 1066
  • [7] SoftPatch: Unsupervised Anomaly Detection with Noisy Data
    Jiang, Xi
    Liu, Jianlin
    Wang, Jinbao
    Nie, Qian
    Wu, Kai
    Liu, Yong
    Wang, Chengjie
    Zheng, Feng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [8] HYPERSPECTRAL ANOMALY DETECTION WITH DATA SPHERING AND UNSUPERVISED TARGET DETECTION
    Chen, Shuhan
    Li, Xiaorun
    Zhao, Liaoying
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 1975 - 1978
  • [9] Unsupervised Anomaly Detection Based on Data Augmentation and Mixing
    Ishida, Naoya
    Nagatsu, Yuki
    Hashimoto, Hideki
    IECON 2020: THE 46TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2020, : 529 - 533
  • [10] Unsupervised detection of contextual anomaly in remotely sensed data
    Liu, Qi
    Klucik, Rudy
    Chen, Chao
    Grant, Glenn
    Gallaher, David
    Lv, Qin
    Shang, Li
    REMOTE SENSING OF ENVIRONMENT, 2017, 202 : 75 - 87