Multiview Regularized Discriminant Canonical Correlation Analysis: Sequential Extraction of Relevant Features From Multiblock Data

被引:3
|
作者
Mandal, Ankita [1 ]
Maji, Pradipta [1 ]
机构
[1] Indian Stat Inst, Machine Intelligence Unit, Biomed Imaging & Bioinformat Lab, Kolkata 700108, India
关键词
Feature extraction; Correlation; Covariance matrices; Data mining; Data analysis; Optimization; Statistical analysis; Canonical correlation analysis (CCA); feature extraction; multimodal data analysis; ridge regression optimization; SETS;
D O I
10.1109/TCYB.2022.3155875
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One of the important issues associated with real-life high-dimensional data analysis is how to extract significant and relevant features from multiview data. The multiset canonical correlation analysis (MCCA) is a well-known statistical method for multiview data integration. It finds a linear subspace that maximizes the correlations among different views. However, the existing methods to find the multiset canonical variables are computationally very expensive, which restricts the application of the MCCA in real-life big data analysis. The covariance matrix of each high-dimensional view may also suffer from the singularity problem due to the limited number of samples. Moreover, the MCCA-based existing feature extraction algorithms are, in general, unsupervised in nature. In this regard, a new supervised feature extraction algorithm is proposed, which integrates multimodal multidimensional data sets by solving maximal correlation problem of the MCCA. A new block matrix representation is introduced to reduce the computational complexity for computing the canonical variables of the MCCA. The analytical formulation enables efficient computation of the multiset canonical variables under supervised ridge regression optimization technique. It deals with the ``curse of dimensionality'' problem associated with high-dimensional data and facilitates the sequential generation of relevant features with significantly lower computational cost. The effectiveness of the proposed multiblock data integration algorithm, along with a comparison with other existing methods, is demonstrated on several benchmark and real-life cancer data.
引用
收藏
页码:5497 / 5509
页数:13
相关论文
共 44 条
  • [31] The automated extraction of environmentally relevant features from digital imagery using Bayesian multi-resolution analysis
    Pal, C
    Swayne, D
    Frey, B
    ADVANCES IN ENVIRONMENTAL RESEARCH, 2001, 5 (04): : 435 - 444
  • [32] Finding dependent and independent components from related data sets: A generalized canonical correlation analysis based method
    Karhunen, Juha
    Hao, Tele
    Ylipaavalniemi, Jarkko
    NEUROCOMPUTING, 2013, 113 : 153 - 167
  • [33] Canonical Correlation Analysis Of Histomic And Transcriptomic Data From Ischemic Stroke Thrombi Identifies Complex Traits Associated With Etiology
    Santo, Briana
    Poppenberg, Kerry
    Monteiro, Andre
    Siddiqui, Adnan H.
    Tutino, Vincent
    STROKE, 2023, 54
  • [34] A method for degradation features extraction of diesel engine valve clearance based on modified complete ensemble empirical mode decomposition with adaptive noise and discriminant correlation analysis feature fusion
    Ke, Yun
    Hu, Yihuai
    Song, Enzhe
    Yao, Chong
    Dong, Quan
    JOURNAL OF VIBRATION AND CONTROL, 2022, 28 (19-20) : 2570 - 2584
  • [35] BIOEQUIVALENCE ASSESSMENT OF DILTIAZEM PREPARATIONS BY MEANS OF DISCRIMINANT-ANALYSIS OF DATA FROM SOLID-PHASE EXTRACTION AND LIQUID-CHROMATOGRAPHY
    DASNEVES, HJC
    DASILVA, MDRG
    ROCHA, MP
    JOURNAL OF PHARMACEUTICAL AND BIOMEDICAL ANALYSIS, 1991, 9 (10-12) : 941 - 947
  • [36] A method for degradation features extraction of diesel engine valve clearance based on modified complete ensemble empirical mode decomposition with adaptive noise and discriminant correlation analysis feature fusion
    Ke, Yun
    Hu, Yihuai
    Song, Enzhe
    Yao, Chong
    Dong, Quan
    JVC/Journal of Vibration and Control, 2022, 28 (19-20): : 2570 - 2584
  • [37] Extracting drug mechanism and pharmacodynamic information from clinical electroencephalographic data using generalised semi-linear canonical correlation analysis
    Brain, P.
    Strimenopoulou, F.
    Diukova, A.
    Berry, E.
    Jolly, A.
    Hall, J. E.
    Wise, R. G.
    Ivarsson, M.
    Wilson, F. J.
    PHYSIOLOGICAL MEASUREMENT, 2014, 35 (12) : 2459 - 2474
  • [38] Genome-Wide Canonical Correlation Analysis-Based Computational Methods for Mining Information from Microbiome and Gene Expression Data
    Shikder, Rayhan
    Irani, Pourang
    Hu, Pingzhao
    ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, 11489 : 511 - 517
  • [39] ESTIMATION OF PATHOLOGICAL-CHANGES FROM CLINICO-LABORATORY DATA IN DIABETIC NEPHROPATHY - APPLICATION OF CANONICAL CORRELATION ANALYSIS IN MEDICAL FIELD
    HOSHI, M
    KIKKAWA, R
    SHIGETA, Y
    ABE, H
    DIABETES, 1977, 26 : 401 - 401
  • [40] Selecting relevant features from fabric images for automated quality control of seam pucker using data analysis and human experts grading
    Koehl, Ludovic
    Miou, Jawad Chraibi
    Zeng, Xianyi
    COMPUTATIONAL TEXTILE, 2007, : 39 - +