Multiview Regularized Discriminant Canonical Correlation Analysis: Sequential Extraction of Relevant Features From Multiblock Data

被引:3
|
作者
Mandal, Ankita [1 ]
Maji, Pradipta [1 ]
机构
[1] Indian Stat Inst, Machine Intelligence Unit, Biomed Imaging & Bioinformat Lab, Kolkata 700108, India
关键词
Feature extraction; Correlation; Covariance matrices; Data mining; Data analysis; Optimization; Statistical analysis; Canonical correlation analysis (CCA); feature extraction; multimodal data analysis; ridge regression optimization; SETS;
D O I
10.1109/TCYB.2022.3155875
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One of the important issues associated with real-life high-dimensional data analysis is how to extract significant and relevant features from multiview data. The multiset canonical correlation analysis (MCCA) is a well-known statistical method for multiview data integration. It finds a linear subspace that maximizes the correlations among different views. However, the existing methods to find the multiset canonical variables are computationally very expensive, which restricts the application of the MCCA in real-life big data analysis. The covariance matrix of each high-dimensional view may also suffer from the singularity problem due to the limited number of samples. Moreover, the MCCA-based existing feature extraction algorithms are, in general, unsupervised in nature. In this regard, a new supervised feature extraction algorithm is proposed, which integrates multimodal multidimensional data sets by solving maximal correlation problem of the MCCA. A new block matrix representation is introduced to reduce the computational complexity for computing the canonical variables of the MCCA. The analytical formulation enables efficient computation of the multiset canonical variables under supervised ridge regression optimization technique. It deals with the ``curse of dimensionality'' problem associated with high-dimensional data and facilitates the sequential generation of relevant features with significantly lower computational cost. The effectiveness of the proposed multiblock data integration algorithm, along with a comparison with other existing methods, is demonstrated on several benchmark and real-life cancer data.
引用
收藏
页码:5497 / 5509
页数:13
相关论文
共 44 条
  • [41] Hyper-graph based sparse canonical correlation analysis for the diagnosis of Alzheimer?s disease from multi-dimensional genomic data
    Shao, Wei
    Xiang, Shunian
    Zhang, Zuoyi
    Huang, Kun
    Zhang, Jie
    METHODS, 2021, 189 : 86 - 94
  • [42] A Canonical Correlation Analysis-Based Dynamic Bayesian Network Prior to Infer Gene Regulatory Networks from Multiple Types of Biological Data
    Baur, Brittany
    Bozdag, Serdar
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2015, 22 (04) : 289 - 299
  • [43] Estimation of relationships between chemical substructures and antibiotic resistance-related gene expression in bacteria: Adapting a canonical correlation analysis for small sample data of gathered features using consensus clustering
    Esaki, Tsuyoshi
    Horinouchi, Takaaki
    Natsume-Kitatani, Yayoi
    Nojima, Yosui
    Sakane, Iwao
    Matsui, Hidetoshi
    CHEM-BIO INFORMATICS JOURNAL, 2020, 20 : 58 - 61
  • [44] A Cross-Angle Propagation Network for Built-Up Area Extraction by Fusing Spatial-Spectral-Angular Features From the ZY-3 Multiview Satellite Imagery: Dataset and Analysis of China's 41 Major Cities
    Zuo, Renxiang
    Huang, Xin
    Li, Jiayi
    Pan, Xiaofeng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62