Comparison of dimension reduction techniques in the analysis of mass spectrometry data

被引:13
|
作者
Isokaanta, Sini [1 ]
Kari, Eetu [1 ,3 ]
Buchholz, Angela [1 ]
Hao, Liqing [1 ]
Schobesberger, Siegfried [1 ]
Virtanen, Annele [1 ]
Mikkonen, Santtu [1 ,2 ]
机构
[1] Univ Eastern Finland, Dept Appl Phys, Kuopio 70210, Finland
[2] Univ Eastern Finland, Dept Environm & Biol Sci, Kuopio 70210, Finland
[3] Neste Oyj, Espoo 02150, Finland
基金
芬兰科学院; 欧盟地平线“2020”;
关键词
EXPLORATORY FACTOR-ANALYSIS; SECONDARY ORGANIC AEROSOL; NONNEGATIVE MATRIX FACTORIZATION; SOURCE APPORTIONMENT; NUMBER; DECONVOLUTION; EMISSIONS; COMPLEX; FIT; VALIDATION;
D O I
10.5194/amt-13-2995-2020
中图分类号
P4 [大气科学(气象学)];
学科分类号
0706 ; 070601 ;
摘要
Online analysis with mass spectrometers produces complex data sets, consisting of mass spectra with a large number of chemical compounds (ions). Statistical dimension reduction techniques (SDRTs) are able to condense complex data sets into a more compact form while preserving the information included in the original observations. The general principle of these techniques is to investigate the underlying dependencies of the measured variables by combining variables with similar characteristics into distinct groups, called factors or components. Currently, positive matrix factorization (PMF) is the most commonly exploited SDRT across a range of atmospheric studies, in particular for source apportionment. In this study, we used five different SDRTs in analysing mass spectral data from complex gasand particle-phase measurements during a laboratory experiment investigating the interactions of gasoline car exhaust and ff -pinene. Specifically, we used four factor analysis techniques, namely principal component analysis (PCA), PMF, exploratory factor analysis (EFA) and non-negative matrix factorization (NMF), as well as one clustering technique, partitioning around medoids (PAM). All SDRTs were able to resolve four to five factors from the gas-phase measurements, including an alpha-pinene precursor factor, two to three oxidation product factors, and a background or car exhaust precursor factor. NMF and PMF provided an additional oxidation product factor, which was not found by other SDRTs. The results from EFA and PCA were similar after applying oblique rotations. For the particle-phase measurements, four factors were discovered with NMF: one primary factor, a mixed-LVOOA factor and two alpha-pinene secondary-organic-aerosol-derived (SOA-derived) factors. PMF was able to separate two factors: semi-volatile oxygenated organic aerosol (SVOOA) and lowvolatility oxygenated organic aerosol (LVOOA). PAM was not able to resolve interpretable clusters due to general limitations of clustering methods, as the high degree of fragmentation taking place in the aerosol mass spectrometer (AMS) causes different compounds formed at different stages in the experiment to be detected at the same variable. However, when preliminary analysis is needed, or isomers and mixed sources are not expected, cluster analysis may be a useful tool, as the results are simpler and thus easier to interpret. In the factor analysis techniques, any single ion generally contributes to multiple factors, although EFA and PCA try to minimize this spread. Our analysis shows that different SDRTs put emphasis on different parts of the data, and with only one technique, some interesting data properties may still stay undiscovered. Thus, validation of the acquired results, either by comparing between different SDRTs or applying one technique multiple times (e.g. by resampling the data or giving different starting values for iterative algorithms), is important, as it may protect the user from dismissing unexpected results as "un-physical".
引用
收藏
页码:2995 / 3022
页数:28
相关论文
共 50 条
  • [1] Dimension reduction techniques for the integrative analysis of multi-omics data
    Meng, Chen
    Zeleznik, Oana A.
    Thallinger, Gerhard G.
    Kuster, Bernhard
    Gholami, Amin M.
    Culhane, Aedin C.
    [J]. BRIEFINGS IN BIOINFORMATICS, 2016, 17 (04) : 628 - 641
  • [2] Comparative Study on Dimension Reduction Techniques for Cluster Analysis of Microarray Data
    Araujo, Daniel
    Doria Neto, Adriao
    Martins, Allan
    Melo, Jorge
    [J]. 2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2011, : 1835 - 1842
  • [3] Various dimension reduction techniques for high dimensional data analysis: a review
    Papia Ray
    S. Surender Reddy
    Tuhina Banerjee
    [J]. Artificial Intelligence Review, 2021, 54 : 3473 - 3515
  • [4] Various dimension reduction techniques for high dimensional data analysis: a review
    Ray, Papia
    Reddy, S. Surender
    Banerjee, Tuhina
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (05) : 3473 - 3515
  • [5] Matrix Factorization Techniques for Analysis of Imaging Mass Spectrometry Data
    Siy, Peter W.
    Moffitt, Richard A.
    Parry, R. Mitchell
    Chen, Yanfeng
    Liu, Ying
    Sullards, M. Cameron
    Merrill, Alfred H., Jr.
    Wang, May D.
    [J]. 8TH IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING, VOLS 1 AND 2, 2008, : 875 - +
  • [6] Dimension Reduction Techniques for Distributional Symbolic Data
    Verde, Rosanna
    Irpino, Antonio
    Balzanella, Antonio
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (02) : 344 - 355
  • [7] Iterative partial least squares with right-censored data analysis: A comparison to other dimension reduction techniques
    Huang, J
    Harrington, D
    [J]. BIOMETRICS, 2005, 61 (01) : 17 - 24
  • [8] Dimensionality reduction for mass Spectrometry data
    Liu, Yihui
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2007, 4632 : 203 - 213
  • [9] Comparison of dimension reduction techniques applied to the analysis of airborne radionuclide activity concentration
    Russo, A.
    Borras, A.
    [J]. JOURNAL OF ENVIRONMENTAL RADIOACTIVITY, 2022, 244
  • [10] Comparison of colorimetric and membrane introduction mass spectrometry techniques for chloramine analysis
    Lee, Wontae
    Westerhoff, Paul
    Yang, Xin
    Shang, Chii
    [J]. WATER RESEARCH, 2007, 41 (14) : 3097 - 3102