CO-CLUSTERING OF MULTIVARIATE FUNCTIONAL DATA FOR THE ANALYSIS OF AIR POLLUTION IN THE SOUTH OF FRANCE

被引:9
|
作者
Bouveyron, Charles [1 ]
Jacques, Julien [2 ]
Schmutz, Amandine [2 ]
Simoes, Fanny [3 ]
Bottini, Silvia [3 ]
机构
[1] Univ Cote dAzur, INRIA, LJAD, CNRS,Maasai, Nice, France
[2] Univ Lyon, Lab ERIC, Lyon 2, Lyon, France
[3] Univ Cote dAzur, MSI, Nice, France
来源
ANNALS OF APPLIED STATISTICS | 2022年 / 16卷 / 03期
关键词
Latent block model; multivariate functional data; SEM-Gibbs algorithm; pollution; co-clustering; MIXTURE MODEL; MORTALITY; DENSITY;
D O I
10.1214/21-AOAS1547
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Nowadays, air pollution is a major threat for public health with clear relationships with many diseases, especially cardiovascular ones. The spatiotemporal study of pollution is of great interest for governments and local authorities when deciding for public alerts or new city policies against pollution increase. The aim of this work is to study spatiotemporal profiles of environmental data collected in the south of France (Region Sud) by the public agency AtmoSud. The idea is to better understand the exposition to pollutants of inhabitants on a large territory with important differences in term of geography and urbanism. The data gather the recording of daily measurements of five environmental variables, namely, three pollutants (PM10, NO2, O-3) and two meteorological factors (pressure and temperature) over six years. Those data can be seen as multivariate functional data: quantitative entities evolving along time for which there is a growing need of methods to summarize and understand them. For this purpose a novel co-clustering model for multivariate functional data is defined. The model is based on a functional latent block model which assumes for each co-cluster a probabilistic distribution for multivariate functional principal component scores. A stochastic EM algorithm, embedding a Gibbs sampler, is proposed for model inference as well as a model selection criteria for choosing the number of co-clusters. The application of the proposed co-clustering algorithm on environmental data of the Region Sud allowed to divide the region, composed by 357 zones, into six macroareas with common exposure to pollution. We showed that pollution profiles vary accordingly to the seasons, and the patterns are similar during the six years studied. These results can be used by local authorities to develop specific programs to reduce pollution at the macroarea level and to identify specific periods of the year with high pollution peaks in order to set up specific health prevention programs. Overall, the proposed co-clustering approach is a powerful resource to analyse multivariate functional data in order to identify intrinsic data structure and to summarize variables profiles over long periods of time.
引用
收藏
页码:1400 / 1422
页数:23
相关论文
共 50 条
  • [41] A Semi-supervised Fuzzy Co-clustering Framework and Application to Twitter Data Analysis
    Honda, Katsuhiro
    Ubukata, Seiki
    Notsu, Akira
    Takahashi, Norimitsu
    Ishikawa, Yutaka
    2015 4TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION ICIEV 15, 2015,
  • [42] A hash-based co-clustering algorithm for categorical data
    de Franca, Fabricio Olivetti
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 64 : 24 - 35
  • [43] Co-Clustering Structural Temporal Data with Applications to Semiconductor Manufacturing
    Zhu, Yada
    He, Jingrui
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2016, 10 (04)
  • [44] Co-clustering based classification of multi-view data
    Hussain, Syed Fawad
    Khan, Mohsin
    Siddiqi, Imran
    APPLIED INTELLIGENCE, 2022, 52 (13) : 14756 - 14772
  • [45] Weighted Co-clustering Approach for Heart Disease Analysis
    Bethel, G. N. Beena
    Rajinikanth, T. V.
    Raju, S. Viswanadha
    PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INFORMATICS, ICCII 2016, 2017, 507 : 511 - 522
  • [46] PAC-Bayesian analysis of Co-clustering and beyond
    Seldin, Yevgeny
    Tishby, Naftali
    Journal of Machine Learning Research, 2010, 11 : 3595 - 3646
  • [47] PAC-Bayesian Analysis of Co-clustering and Beyond
    Seldin, Yevgeny
    Tishby, Naftali
    JOURNAL OF MACHINE LEARNING RESEARCH, 2010, 11 : 3595 - 3646
  • [48] Regularized Dual-PPMI Co-clustering for Text Data
    Affeldt, Severine
    Labiod, Lazhar
    Nadif, Mohamed
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 2263 - 2267
  • [49] Scalable Overlapping Co-Clustering of Word-Document Data
    de Franca, Fabricio Olivetti
    2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1, 2012, : 464 - 467
  • [50] Heterogeneous Sparse Relational Data Co-Clustering in Social network
    Shen, Guowei
    Wang, Wei
    Yang, Wu
    Yu, Miao
    Dong, Guozhong
    IEEE 12TH INT CONF UBIQUITOUS INTELLIGENCE & COMP/IEEE 12TH INT CONF ADV & TRUSTED COMP/IEEE 15TH INT CONF SCALABLE COMP & COMMUN/IEEE INT CONF CLOUD & BIG DATA COMP/IEEE INT CONF INTERNET PEOPLE AND ASSOCIATED SYMPOSIA/WORKSHOPS, 2015, : 77 - 84