Evaluating multivariate time-series clustering using simulated ecological momentary assessment data

被引:1
|
作者
Ntekouli, Mandani [1 ]
Spanakis, Gerasimos [1 ]
Waldorp, Lourens [2 ]
Roefs, Anne [3 ]
机构
[1] Maastricht Univ, Dept Adv Comp Sci, Maastricht, Netherlands
[2] Univ Amsterdam, Dept Psychol Methods, Amsterdam, Netherlands
[3] Maastricht Univ, Fac Psychol & Neurosci, Maastricht, Netherlands
来源
基金
荷兰研究理事会;
关键词
Multivariate time-series; Ecological momentary assessment; Clustering; Distance time warping; Global alignment kernel; KERNEL;
D O I
10.1016/j.mlwa.2023.100512
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
During an Ecological Momentary Assessment (EMA) study, through repeated digital questionnaires, we have the opportunity to collect multiple multivariate time -series (MTS) data for all participants. Although, it is common that individual data is analyzed per participant, the richness of such dataset poses the question of whether meaningful groups of individuals could be uncovered to better understand the underlying processes on an individual and a group level. Such grouping could be obtained by clustering. Therefore, this paper examines the performance of various clustering approaches for grouping individuals based on the similarity of their raw time -series data patterns. Clustering is an unsupervised task, where the true underlying groups are not usually available, making the result difficult to evaluate. Therefore, in the current paper, simulated irregular time -series data, resembling EMA, are used to validate the performance of several methods under different clustering -related choices, such as the distance metric. Data are generated with a varying number of clusters, total number of individuals and time -points as well as number of variables and proportions of noisy variables, while their time -series represent well -shaped patterns, typically observed in emotional behavior. After applying clustering to all simulated datasets, clustering performance was first assessed by comparing the true and predicted labels, while the impact of the different datasets' parameters was also examined. Because ground truth labels are not always available, or do not even exist, in real -world scenarios, clustering evaluation through distance -based and distance -free measures was further investigated. Overall, all clustering methods (e.g. k -means, Hierarchical clustering, Fuzzy k-medoids) proved reliable in different configurations, revealing the true number of clusters. Moreover, kernel -based methods appeared more efficient when highly noisy variables are involved, becoming more promising for real -world data. As a second part, an illustration of two specific simulated scenarios (datasets) is provided, showing, in more detail, all different analysis steps before drawing a conclusion about the choice of the optimal number of clusters.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] Incremental Clustering of Time-Series by Fuzzy Clustering
    Aghabozorgi, Saeed
    Saybani, Mahmoud Reza
    Teh, Ying Wah
    [J]. JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2012, 28 (04) : 671 - 688
  • [42] Independent component analysis for clustering multivariate time series data
    Wu, EHC
    Yu, PLH
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 474 - 482
  • [43] Clustering time-series energy data from smart meters
    Alexander Lavin
    Diego Klabjan
    [J]. Energy Efficiency, 2015, 8 : 681 - 689
  • [44] Clustering River Basins Using Time-Series Data Mining on Hydroelectric Energy Generation
    Arslan, Yusuf
    Kucuk, Dilek
    Eren, Sinan
    Birturk, Aysenur
    [J]. DATA ANALYTICS FOR RENEWABLE ENERGY INTEGRATION: TECHNOLOGIES, SYSTEMS AND SOCIETY (DARE 2018), 2018, 11325 : 103 - 115
  • [45] Quantification and clustering of phenotypic screening data using time-series analysis for chemotherapy of schistosomiasis
    Lee, Hyokyeong
    Moody-Davis, Asher
    Saha, Utsab
    Suzuki, Brian M.
    Asarnow, Daniel
    Chen, Steven
    Arkin, Michelle
    Caffrey, Conor R.
    Singh, Rahul
    [J]. BMC GENOMICS, 2012, 13
  • [46] Time-series clustering and forecasting household electricity demand using smart meter data
    Kim, Hyojeoung
    Park, Sujin
    Kim, Sahm
    [J]. ENERGY REPORTS, 2023, 9 : 4111 - 4121
  • [47] Clustering Microarray Time-series Data using Expectation Maximization and Multiple Profile Alignment
    Subhani, Numanul
    Rueda, Luis
    Ngom, Alioune
    Burden, Conrad J.
    [J]. BIBMW: 2009 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOP, 2009, : 1 - +
  • [48] Quantification and clustering of phenotypic screening data using time-series analysis for chemotherapy of schistosomiasis
    Hyokyeong Lee
    Asher Moody-Davis
    Utsab Saha
    Brian M Suzuki
    Daniel Asarnow
    Steven Chen
    Michelle Arkin
    Conor R Caffrey
    Rahul Singh
    [J]. BMC Genomics, 13
  • [49] ANALYSIS OF LONG-TERM ECOLOGICAL DATA USING CATEGORICAL TIME-SERIES REGRESSION
    ROSE, KA
    SUMMERS, JK
    CUMMINS, RA
    HEIMBUCH, DG
    [J]. CANADIAN JOURNAL OF FISHERIES AND AQUATIC SCIENCES, 1986, 43 (12) : 2418 - 2426
  • [50] Assessment of ecological disturbance in the mangrove forest of Sundarbans caused by cyclones using MODIS time-series data (2001–2011)
    Dibyendu Dutta
    Prabir Kumar Das
    Soubhik Paul
    Jaswant Raj Sharma
    Vinay Kumar Dadhwal
    [J]. Natural Hazards, 2015, 79 : 775 - 790