Q-mode versus R-mode Principal Component Analysis for Linear Discriminant Analysis (LDA)

被引:6
|
作者
Lee, Loong Chuen [1 ,2 ]
Liong, Choong-Yeun [2 ]
Jemain, Abdul Aziz [2 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Hlth Sci, Forens Sci Program, Ukm Kuala Lumpur 50300, Malaysia
[2] Univ Kebangsaan Malaysia, Sch Math Sci, Fac Sci & Technol, Ukm Bangi 43600, Selangor De, Malaysia
关键词
principal component analysis (PCA); linear discriminant analysis (LDA); Forensic paper analysis; IR spectrum; CLASSIFICATION; SPECTROSCOPY;
D O I
10.1063/1.4982862
中图分类号
O59 [应用物理学];
学科分类号
摘要
Many literature apply Principal Component Analysis ( PCA) as either preliminary visualization or variable construction methods or both. Focus of PCA can be on the samples (R-mode PCA) or variables (Q-mode PCA). Traditionally, R-mode PCA has been the usual approach to reduce high-dimensionality data before the application of Linear Discriminant Analysis (LDA), to solve classification problems. Output from PCA composed of two new matrices known as loadings and scores matrices. Each matrix can then be used to produce a plot, i.e. loadings plot aids identification of important variables whereas scores plot presents spatial distribution of samples on new axes that are also known as Principal Components (PCs). Fundamentally, the scores matrix always be the input variables for building classification model. A recent paper uses Q-mode PCA but the focus of analysis was not on the variables but instead on the samples. As a result, the authors have exchanged the use of both loadings and scores plots in which clustering of samples was studied using loadings plot whereas scores plot has been used to identify important manifest variables. Therefore, the aim of this study is to statistically validate the proposed practice. Evaluation is based on performance of external error obtained from LDA models according to number of PCs. On top of that, bootstrapping was also conducted to evaluate the external error of each of the LDA models. Results show that LDA models produced by PCs from R-mode PCA give logical performance and the matched external error are also unbiased whereas the ones produced with Q-mode PCA show the opposites. With that, we concluded that PCs produced from Q-mode is not statistically stable and thus should not be applied to problems of classifying samples, but variables. We hope this paper will provide some insights on the disputable issues.
引用
收藏
页数:5
相关论文
共 50 条