Multiview Regularized Discriminant Canonical Correlation Analysis: Sequential Extraction of Relevant Features From Multiblock Data

被引：3

作者：

Mandal, Ankita ^{[1
]}

Maji, Pradipta ^{[1
]}

机构：

[1] Indian Stat Inst, Machine Intelligence Unit, Biomed Imaging & Bioinformat Lab, Kolkata 700108, India

来源：

IEEE TRANSACTIONS ON CYBERNETICS | 2023年 / 53卷 / 09期

关键词：

Feature extraction; Correlation; Covariance matrices; Data mining; Data analysis; Optimization; Statistical analysis; Canonical correlation analysis (CCA); feature extraction; multimodal data analysis; ridge regression optimization; SETS;

D O I：

10.1109/TCYB.2022.3155875

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

One of the important issues associated with real-life high-dimensional data analysis is how to extract significant and relevant features from multiview data. The multiset canonical correlation analysis (MCCA) is a well-known statistical method for multiview data integration. It finds a linear subspace that maximizes the correlations among different views. However, the existing methods to find the multiset canonical variables are computationally very expensive, which restricts the application of the MCCA in real-life big data analysis. The covariance matrix of each high-dimensional view may also suffer from the singularity problem due to the limited number of samples. Moreover, the MCCA-based existing feature extraction algorithms are, in general, unsupervised in nature. In this regard, a new supervised feature extraction algorithm is proposed, which integrates multimodal multidimensional data sets by solving maximal correlation problem of the MCCA. A new block matrix representation is introduced to reduce the computational complexity for computing the canonical variables of the MCCA. The analytical formulation enables efficient computation of the multiset canonical variables under supervised ridge regression optimization technique. It deals with the ``curse of dimensionality'' problem associated with high-dimensional data and facilitates the sequential generation of relevant features with significantly lower computational cost. The effectiveness of the proposed multiblock data integration algorithm, along with a comparison with other existing methods, is demonstrated on several benchmark and real-life cancer data.

引用

页码：5497 / 5509

页数：13

共 44 条

[41] Hyper-graph based sparse canonical correlation analysis for the diagnosis of Alzheimer?s disease from multi-dimensional genomic data
Shao, Wei
Xiang, Shunian
Zhang, Zuoyi
Huang, Kun
Zhang, Jie
METHODS, 2021, 189 : 86 - 94
[42] A Canonical Correlation Analysis-Based Dynamic Bayesian Network Prior to Infer Gene Regulatory Networks from Multiple Types of Biological Data
Baur, Brittany
Bozdag, Serdar
JOURNAL OF COMPUTATIONAL BIOLOGY, 2015, 22 (04) : 289 - 299
[43] Estimation of relationships between chemical substructures and antibiotic resistance-related gene expression in bacteria: Adapting a canonical correlation analysis for small sample data of gathered features using consensus clustering
Esaki, Tsuyoshi
Horinouchi, Takaaki
Natsume-Kitatani, Yayoi
Nojima, Yosui
Sakane, Iwao
Matsui, Hidetoshi
CHEM-BIO INFORMATICS JOURNAL, 2020, 20 : 58 - 61
[44] A Cross-Angle Propagation Network for Built-Up Area Extraction by Fusing Spatial-Spectral-Angular Features From the ZY-3 Multiview Satellite Imagery: Dataset and Analysis of China's 41 Major Cities
Zuo, Renxiang
Huang, Xin
Li, Jiayi
Pan, Xiaofeng
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62

← 1 2 3 4 5 →