Challenges of Clustering Multimodal Clinical Data: Review of Applications in Asthma Subtyping

被引:25
|
作者
Horne, Elsie [1 ]
Tibble, Holly [1 ]
Sheikh, Aziz [1 ]
Tsanas, Athanasios [1 ]
机构
[1] Univ Edinburgh, Edinburgh Med Sch, Usher Inst, Nine Edinburgh Bio Quarter,9 Little France Rd, Edinburgh EH16 4UX, Midlothian, Scotland
基金
英国医学研究理事会; 英国经济与社会研究理事会; 英国惠康基金; 英国工程与自然科学研究理事会;
关键词
asthma; cluster analysis; data mining; machine learning; unsupervised machine learning; SYSTEMATIC ANALYSIS; YOUNG-CHILDREN; GLOBAL BURDEN; 195; COUNTRIES; PHENOTYPES; DISEASE; HETEROGENEITY; TERRITORIES; PREVALENCE; VALIDATION;
D O I
10.2196/16452
中图分类号
R-058 [];
学科分类号
摘要
Background: In the current era of personalized medicine, there is increasing interest in understanding the heterogeneity in disease populations. Cluster analysis is a method commonly used to identify subtypes in heterogeneous disease populations. The clinical data used in such applications are typically multimodal, which can make the application of traditional cluster analysis methods challenging. Objective: This study aimed to review the research literature on the application of clustering multimodal clinical data to identify asthma subtypes. We assessed common problems and shortcomings in the application of cluster analysis methods in determining asthma subtypes, such that they can be brought to the attention of the research community and avoided in future studies. Methods: We searched PubMed and Scopus bibliographic databases with terms related to cluster analysis and asthma to identify studies that applied dissimilarity-based cluster analysis methods. We recorded the analytic methods used in each study at each step of the cluster analysis process. Results: Our literature search identified 63 studies that applied cluster analysis to multimodal clinical data to identify asthma subtypes. The features fed into the cluster algorithms were of a mixed type in 47 (75%) studies and continuous in 12 (19%), and the feature type was unclear in the remaining 4 (6%) studies. A total of 23 (37%) studies used hierarchical clustering with Ward linkage, and 22 (35%) studies used k-means clustering. Of these 45 studies, 39 had mixed-type features, but only 5 specified dissimilarity measures that could handle mixed-type features. A further 9 (14%) studies used a preclustering step to create small clusters to feed on a hierarchical method. The original sample sizes in these 9 studies ranged from 84 to 349. The remaining studies used hierarchical clustering with other linkages (n=3), medoid-based methods (n=3), spectral clustering (n=1), and multiple kernel k-means clustering (n=1), and in 1 study, the methods were unclear. Of 63 studies, 54 (86%) explained the methods used to determine the number of clusters, 24 (38%) studies tested the quality of their cluster solution, and 11 (17%) studies tested the stability of their solution. Reporting of the cluster analysis was generally poor in terms of the methods employed and their justification. Conclusions: This review highlights common issues in the application of cluster analysis to multimodal clinical data to identify asthma subtypes. Some of these issues were related to the multimodal nature of the data, but many were more general issues in the application of cluster analysis. Although cluster analysis may be a useful tool for investigating disease subtypes, we recommend that future studies carefully consider the implications of clustering multimodal data, the cluster analysis process itself, and the reporting of methods to facilitate replication and interpretation of findings.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Subtyping children with asthma by clustering analysis of mRNA expression data
    Wang, Ting
    He, Changhui
    Hu, Ming
    Wu, Honghua
    Ou, Shuteng
    Li, Yuke
    Fan, Chuping
    FRONTIERS IN GENETICS, 2022, 13
  • [2] Clustering on Sparse Data in Non-Overlapping Feature Space with Applications to Cancer Subtyping
    Kang, Tianyu
    Zarringhalam, Kourosh
    Kuijjer, Marieke
    Chen, Ping
    Quackenbush, John
    Ding, Wei
    2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 1079 - 1084
  • [3] A Review on Big Data Applications and their Challenges
    Prabhugouda, Amruta
    Asra, Syeda
    JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2024, 23 (06)
  • [4] Big Data Clustering Techniques Challenges and Perspectives: Review
    Awad F.H.
    Hamad M.M.
    Informatica (Slovenia), 2023, 47 (06): : 203 - 218
  • [5] Rise of Deep Learning Clinical Applications and Challenges in Omics Data: A Systematic Review
    Mohammed, Mazin Abed
    Abdulkareem, Karrar Hameed
    Dinar, Ahmed M.
    Zapirain, Begonya Garcia
    DIAGNOSTICS, 2023, 13 (04)
  • [6] A Comprehensive Review of Multimodal XR Applications, Risks, and Ethical Challenges in the Metaverse
    Kourtesis, Panagiotis
    MULTIMODAL TECHNOLOGIES AND INTERACTION, 2024, 8 (11)
  • [7] A Review of CNN Applications in Smart Agriculture Using Multimodal Data
    El Sakka, Mohammad
    Ivanovici, Mihai
    Chaari, Lotfi
    Mothe, Josiane
    SENSORS, 2025, 25 (02)
  • [8] CHALLENGES IN MULTIMODAL DATA FUSION
    Lahat, Dana
    Adali, Tulay
    Jutten, Christian
    2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 101 - 105
  • [9] Distributed Efficient Multimodal Data Clustering
    Chen, Jia
    Schizas, Ioannis D.
    2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 2304 - 2308
  • [10] Challenges in Deep Learning for Multimodal Applications
    Ghosh, Sayan
    ICMI'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2015, : 611 - 615