CO-CLUSTERING OF MULTIVARIATE FUNCTIONAL DATA FOR THE ANALYSIS OF AIR POLLUTION IN THE SOUTH OF FRANCE

被引：9

作者：

Bouveyron, Charles ^{[1
]}

Jacques, Julien ^{[2
]}

Schmutz, Amandine ^{[2
]}

Simoes, Fanny ^{[3
]}

Bottini, Silvia ^{[3
]}

机构：

[1] Univ Cote dAzur, INRIA, LJAD, CNRS,Maasai, Nice, France

[2] Univ Lyon, Lab ERIC, Lyon 2, Lyon, France

[3] Univ Cote dAzur, MSI, Nice, France

来源：

ANNALS OF APPLIED STATISTICS | 2022年 / 16卷 / 03期

关键词：

Latent block model; multivariate functional data; SEM-Gibbs algorithm; pollution; co-clustering; MIXTURE MODEL; MORTALITY; DENSITY;

D O I：

10.1214/21-AOAS1547

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Nowadays, air pollution is a major threat for public health with clear relationships with many diseases, especially cardiovascular ones. The spatiotemporal study of pollution is of great interest for governments and local authorities when deciding for public alerts or new city policies against pollution increase. The aim of this work is to study spatiotemporal profiles of environmental data collected in the south of France (Region Sud) by the public agency AtmoSud. The idea is to better understand the exposition to pollutants of inhabitants on a large territory with important differences in term of geography and urbanism. The data gather the recording of daily measurements of five environmental variables, namely, three pollutants (PM10, NO2, O-3) and two meteorological factors (pressure and temperature) over six years. Those data can be seen as multivariate functional data: quantitative entities evolving along time for which there is a growing need of methods to summarize and understand them. For this purpose a novel co-clustering model for multivariate functional data is defined. The model is based on a functional latent block model which assumes for each co-cluster a probabilistic distribution for multivariate functional principal component scores. A stochastic EM algorithm, embedding a Gibbs sampler, is proposed for model inference as well as a model selection criteria for choosing the number of co-clusters. The application of the proposed co-clustering algorithm on environmental data of the Region Sud allowed to divide the region, composed by 357 zones, into six macroareas with common exposure to pollution. We showed that pollution profiles vary accordingly to the seasons, and the patterns are similar during the six years studied. These results can be used by local authorities to develop specific programs to reduce pollution at the macroarea level and to identify specific periods of the year with high pollution peaks in order to set up specific health prevention programs. Overall, the proposed co-clustering approach is a powerful resource to analyse multivariate functional data in order to identify intrinsic data structure and to summarize variables profiles over long periods of time.

引用

页码：1400 / 1422

页数：23

共 50 条

[31] Multitask possibilistic and fuzzy co-clustering algorithm for clustering data with multisource features
Ren, Jiaqi
Yang, Youlong
NEURAL COMPUTING & APPLICATIONS, 2020, 32 (09): : 4785 - 4804
[32] Ensemble Block Co-clustering: A Unified Framework for Text Data
Affeldt, Severine
Labiod, Lazhar
Nadif, Mohamed
CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 5 - 14
[33] Hierarchical Co-Clustering: A New Way to Organize the Music Data
Li, Jingxuan
Shao, Bo
Li, Tao
Ogihara, Mitsunori
IEEE TRANSACTIONS ON MULTIMEDIA, 2012, 14 (02) : 471 - 481
[34] Co-clustering based classification of multi-view data
Syed Fawad Hussain
Mohsin Khan
Imran Siddiqi
Applied Intelligence, 2022, 52 : 14756 - 14772
[35] A Framework for Simultaneous Co-clustering and Learning from Complex Data
Deodhar, Meghana
Ghosh, Joydeep
KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 250 - 259
[36] Co-clustering algorithms for distributional data with automated variable weighting
De Carvalho, Francisco de A. T.
Balzanella, Antonio
Irpino, Antonio
Verde, Rosanna
INFORMATION SCIENCES, 2021, 549 : 87 - 115
[37] Co-clustering Structural Temporal Data with Applications to Semiconductor Manufacturing
Zhu, Yada
He, Jingrui
2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2014, : 1121 - 1126
[38] Multi-manifold matrix decomposition for data co-clustering
Allab, Kais
Labiod, Lazhar
Nadif, Mohamed
PATTERN RECOGNITION, 2017, 64 : 386 - 398
[39] Model-based co-clustering for mixed type data
Selosse, Margot
Jacques, Julien
Biernacki, Christophe
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 144
[40] Ontology-Driven Co-clustering of Gene Expression Data
Cordero, Francesca
Pensa, Ruggero G.
Visconti, Alessia
Ienco, Dino
Botta, Marco
AI (ASTERISK) IA 2009: EMERGENT PERSPECTIVES IN ARTIFICIAL INTELLIGENCE, 2009, 5883 : 426 - +

← 1 2 3 4 5 →