Unveiling elemental fingerprints: A comparative study of clustering methods for multi-element nanoparticle data

被引:3
|
作者
Erfani, Mahdi [1 ]
Baalousha, Mohammed [2 ]
Goharian, Erfan [1 ]
机构
[1] Univ South Carolina, Dept Civil & Environm Engn, Columbia, SC 29208 USA
[2] Univ South Carolina, Ctr Environm Nanosci & Risk, Arnold Sch Publ Hlth, Dept Environm Hlth Sci, Columbia, SC 29201 USA
基金
美国国家科学基金会;
关键词
Engineered nanoparticles; Multi-element single nanoparticle; Mass spectrometry; High dimensional data; Nonlinear clustering; tSNE; Spectral clustering; ICP-MS; TOFMS;
D O I
10.1016/j.scitotenv.2023.167176
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Single particle-inductively coupled plasma-time of flight-mass spectrometers (SP-ICP-TOF-MS) generates large datasets of the multi-elemental composition of nanoparticles. However, extracting useful information from such datasets is challenging. Hierarchical clustering (HC) has been successfully applied to extract elemental fingerprints from multi-element nanoparticle data obtained by SP-ICP-TOF-MS. However, many other clustering approaches can be applied to analyze SP-ICP-TOF-MS data that have not yet been evaluated. This study fills this knowledge gap by comparing the performance of three clustering approaches: HC, spectral clustering, and t distributed Stochastic Neighbor Embedding coupled with Density-Based Spatial Clustering of Applications with Noise (tSNE-DBSCAN) for analyzing SP-ICP-TOF-MS data. The performance of these clustering techniques was evaluated by comparing the size of the extracted clusters and the similarity of the elemental composition of nanoparticles within each cluster. Hierarchical clustering often failed to achieve an optimal clustering solution for SP-ICP-TOF-MS data because HC is sensitive to the presence of outliers. Spectral clustering and tSNE-DBSCAN extracted clusters that were not identified by HC. This is because spectral clustering, a method developed based on graph theory, reveals the global and local structure in the data. tSNE reduces and maps the data into a lower dimensional space, enabling clustering algorithms such as DBSCAN to identify subclusters with subtle differences in their elemental composition. However, tSNE-DBSCAN can lead to unsatisfactory clustering solutions because tuning the perplexity hyperparameter of tSNE is a difficult and a time-consuming task, and the relative distance between datapoints is not maintained. Although the three clustering approaches successfully extract useful information from SP-ICP-TOF-MS data, spectral clustering outperforms HC and tSNE-DBSCAN by generating clusters of a large number of nanoparticles with similar elemental compositions.
引用
收藏
页数:11
相关论文
共 50 条