Evaluating multivariate time-series clustering using simulated ecological momentary assessment data

被引：1

作者：

Ntekouli, Mandani ^{[1
]}

Spanakis, Gerasimos ^{[1
]}

Waldorp, Lourens ^{[2
]}

Roefs, Anne ^{[3
]}

机构：

[1] Maastricht Univ, Dept Adv Comp Sci, Maastricht, Netherlands

[2] Univ Amsterdam, Dept Psychol Methods, Amsterdam, Netherlands

[3] Maastricht Univ, Fac Psychol & Neurosci, Maastricht, Netherlands

来源：

MACHINE LEARNING WITH APPLICATIONS | 2023年 / 14卷

基金：

荷兰研究理事会;

关键词：

Multivariate time-series; Ecological momentary assessment; Clustering; Distance time warping; Global alignment kernel; KERNEL;

D O I：

10.1016/j.mlwa.2023.100512

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

During an Ecological Momentary Assessment (EMA) study, through repeated digital questionnaires, we have the opportunity to collect multiple multivariate time -series (MTS) data for all participants. Although, it is common that individual data is analyzed per participant, the richness of such dataset poses the question of whether meaningful groups of individuals could be uncovered to better understand the underlying processes on an individual and a group level. Such grouping could be obtained by clustering. Therefore, this paper examines the performance of various clustering approaches for grouping individuals based on the similarity of their raw time -series data patterns. Clustering is an unsupervised task, where the true underlying groups are not usually available, making the result difficult to evaluate. Therefore, in the current paper, simulated irregular time -series data, resembling EMA, are used to validate the performance of several methods under different clustering -related choices, such as the distance metric. Data are generated with a varying number of clusters, total number of individuals and time -points as well as number of variables and proportions of noisy variables, while their time -series represent well -shaped patterns, typically observed in emotional behavior. After applying clustering to all simulated datasets, clustering performance was first assessed by comparing the true and predicted labels, while the impact of the different datasets' parameters was also examined. Because ground truth labels are not always available, or do not even exist, in real -world scenarios, clustering evaluation through distance -based and distance -free measures was further investigated. Overall, all clustering methods (e.g. k -means, Hierarchical clustering, Fuzzy k-medoids) proved reliable in different configurations, revealing the true number of clusters. Moreover, kernel -based methods appeared more efficient when highly noisy variables are involved, becoming more promising for real -world data. As a second part, an illustration of two specific simulated scenarios (datasets) is provided, showing, in more detail, all different analysis steps before drawing a conclusion about the choice of the optimal number of clusters.

引用

页数：19

共 50 条

[21] Dimensionality reduction for multivariate time-series data mining
Wan, Xiaoji
Li, Hailin
Zhang, Liping
Wu, Yenchun Jim
JOURNAL OF SUPERCOMPUTING, 2022, 78 (07): : 9862 - 9878
[22] Dimensionality reduction for multivariate time-series data mining
Xiaoji Wan
Hailin Li
Liping Zhang
Yenchun Jim Wu
The Journal of Supercomputing, 2022, 78 : 9862 - 9878
[23] Visualization of multivariate time-series data in a neonatal ICU
Ordonez, P.
Oates, T.
Lombardi, M. E.
Hernandez, G., Jr.
Holmes, K. W.
Fackler, J.
Lehmann, C. U.
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2012, 56 (05)
[24] Finding multivariate outliers in fMRI time-series data
Magnotti, John F.
Billor, Nedret
COMPUTERS IN BIOLOGY AND MEDICINE, 2014, 53 : 115 - 124
[25] Time-Series Clustering for Data Analysis in Smart Grid
Maurya, Akanksha
Akyurek, Alper Sinan
Aksanli, Baris
Rosing, Tajana Simunic
2016 IEEE INTERNATIONAL CONFERENCE ON SMART GRID COMMUNICATIONS (SMARTGRIDCOMM), 2016,
[26] On Some Fuzzy Clustering Algorithms for Time-Series Data
Fujita, Mizuki
Kanzawa, Yuchi
INTEGRATED UNCERTAINTY IN KNOWLEDGE MODELLING AND DECISION MAKING (IUKM 2022), 2022, 13199 : 169 - 181
[27] Controlled-Sized Clustering for Time-Series Data
Tsuda, Nobuhiko
Hamasuna, Yukihiro
2020 JOINT 11TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS AND 21ST INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (SCIS-ISIS), 2020, : 245 - 249
[28] Explaining Clustering of Ecological Momentary Assessment Data Through Temporal and Feature Attention
Ntekouli, Mandani
Spanakis, Gerasimos
Waldorp, Lourens
Roefs, Anne
EXPLAINABLE ARTIFICIAL INTELLIGENCE, PT II, XAI 2024, 2024, 2154 : 75 - 99
[29] Clustering Time-Series Gene Expression Data Using Smoothing Spline Derivatives
Dejean, S.
Martin, P. G. P.
Baccini, A.
Besse, P.
EURASIP JOURNAL ON BIOINFORMATICS AND SYSTEMS BIOLOGY, 2007, (01):
[30] Clustering for time-series gene expression data using mixture of constrained PCAS
Yoshioka, T
Ishii, S
ICONIP'02: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING: COMPUTATIONAL INTELLIGENCE FOR THE E-AGE, 2002, : 2239 - 2243

← 1 2 3 4 5 →