Principal Component Pursuit for Pattern Identification in Environmental Mixtures

被引:5
|
作者
Gibson, Elizabeth A. [1 ]
Zhang, Junhui [2 ]
Yan, Jingkai [3 ]
Chillrud, Lawrence [1 ]
Benavides, Jaime [1 ]
Nunez, Yanelli [1 ]
Herbstman, Julie B. [1 ]
Goldsmith, Jeff [4 ]
Wright, John [3 ]
Kioumourtzoglou, Marianthi-Anna [1 ]
机构
[1] Columbia Univ, Dept Environm Hlth Sci, Mailman Sch Publ Hlth, New York, NY USA
[2] Columbia Univ, Dept Appl Phys & Appl Mathemat, New York, NY USA
[3] Columbia Univ, Dept Elect Engn, Data Sci Inst, New York, NY USA
[4] Columbia Univ, Dept Biostat, Mailman Sch Publ Hlth, New York, NY USA
关键词
D O I
10.1289/EHP10479
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
BACKGROUND: Environmental health researchers often aim to identify sources or behaviors that give rise to potentially harmful environmental OBJECTIVE: We adapted principal component pursuit (PCP)a robust and well-established technique for dimensionality reduction in computer vision and signal processing-to identify patterns in environmental mixtures. PCP decomposes the exposure mixture into a low-rank matrix containing consistent patterns of exposure across pollutants and a sparse matrix isolating unique or extreme exposure events. METHODS: We adapted PCP to accommodate nonnegative data, missing data, and values below a given limit of detection (LOD). We simulated data to represent environmental mixtures of two sizes with increasing proportions <LOD and three noise structures. We applied PCP-LOD to evaluate its performance in comparison with principal component analysis (PCA). We next applied principal component pursuit with limit of detection (PCP-LOD) to an exposure mixture of 21 persistent organic pollutants (POPs) measured in 1,000 U.S. adults from the 2001-2002 National Health and Nutrition Examination Survey (NHANES). We applied singular value decomposition to the estimated low-rank matrix to characterize the patterns. RESULTS: PCP-LOD recovered the true number of patterns through cross-validation for all simulations, based on an a priori specified criterion, PCA recovered the true number of patterns in 32% of simulations. PCP-LOD achieved lower relative predictive error than PCA for all simulated data sets with up to 50% of the data <LOD. When 75% of values were <LOD, PCP-LOD outperformed PCA only when noise was low. In the POP mixture, PCP-LOD identified a rank-three underlying structure and separated 6% of values as extreme events. One pattern represented comprehensive exposure to all POPs. The other patterns grouped chemicals based on known structure and toxicity. DISCUSSION: PCP-LOD serves as a useful tool to express multidimensional exposures as consistent patterns that, if found to be related to adverse health, are amenable to targeted public health messaging.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Stable Principal Component Pursuit
    Zhou, Zihan
    Li, Xiaodong
    Wright, John
    Candes, Emmanuel
    Ma, Yi
    [J]. 2010 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, 2010, : 1518 - 1522
  • [2] Compressive principal component pursuit
    Wright, John
    Ganesh, Arvind
    Min, Kerui
    Ma, Yi
    [J]. INFORMATION AND INFERENCE-A JOURNAL OF THE IMA, 2013, 2 (01) : 32 - 68
  • [3] Dual Principal Component Pursuit
    Tsakiris, Manolis C.
    Vidal, Rene
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOP (ICCVW), 2015, : 850 - 858
  • [4] Compressive Principal Component Pursuit
    Wright, John
    Ganesh, Arvind
    Min, Kerui
    Ma, Yi
    [J]. 2012 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY PROCEEDINGS (ISIT), 2012,
  • [5] Dual Principal Component Pursuit
    Tsakiris, Manolis C.
    Vidal, Rene
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 19
  • [6] SPARSITY REGULARIZED PRINCIPAL COMPONENT PURSUIT
    Liu, Jing
    Cosman, Pamela C.
    Rao, Bhaskar D.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4431 - 4435
  • [7] Noisy Dual Principal Component Pursuit
    Ding, Tianyu
    Zhu, Zhihui
    Ding, Tianjiao
    Yang, Yunchen
    Vidal, Rene
    Tsakiris, Manolis C.
    Robinson, Daniel P.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [8] Reinforced Robust Principal Component Pursuit
    Brahma, Pratik Prabhanjan
    She, Yiyuan
    Li, Shijie
    Li, Jiade
    Wu, Dapeng
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (05) : 1525 - 1538
  • [9] STABLE QUATERNION PRINCIPAL COMPONENT PURSUIT
    Li, Wenxin
    Zhang, Ying
    [J]. PACIFIC JOURNAL OF OPTIMIZATION, 2023, 19 (04): : 607 - 623
  • [10] Mixtures of principal component analyzers
    Tipping, ME
    Bishop, CM
    [J]. FIFTH INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS, 1997, (440): : 13 - 18