An Out of Memory tSVD for Big-Data Factorization

被引:2
|
作者
Carrillo-Cabada, Hector [1 ,3 ]
Skau, Erik [2 ]
Chennupati, Gopinath [2 ]
Alexandrov, Boian [1 ]
Djidjev, Hristo [2 ]
机构
[1] Los Alamos Natl Lab, Theoret Div T 1 Grp, Los Alamos, NM 87544 USA
[2] Los Alamos Natl Lab, Informat Sci CCS 3 Grp, Los Alamos, NM 87544 USA
[3] Univ New Mexico, Dept Comp Sci, Albuquerque, NM 87131 USA
关键词
Tensors; Sparse matrices; Matrix decomposition; Memory management; Security; Contracts; Singular value decomposition; tSVD; out of memory; tensor train; singular vectors; tensor networks; MATRIX; LU;
D O I
10.1109/ACCESS.2020.3000508
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Singular value decomposition (SVD) is a matrix factorization method widely used for dimension reduction, data analytics, information retrieval, and unsupervised learning. In general, only singular values of SVD are needed for most big-data applications. Methods such as tensor networks require an accurate computation of a substantial number of singular vectors, which can be accomplished through truncated SVD (tSVD). Additionally, many real-world datasets are too big to fit into the available memory, which mandates the development of out of memory algorithms that assume that most of the data resides on an external disk during the entire computation. These algorithms reduce communication to disk and hide part of the communication by overlapping it with communication on blocks of work. Here, building upon previous works on SVD for dense matrices, we present a method for computation of a predetermined number, of SVD singular vectors, and the corresponding singular values, of a matrix that cannot fit in the memory. Our out of memory tSVD can be used for tensor networks algorithms. We describe ways for reducing the communication during the computation of the left and right reflectors, needed to compute the singular vectors, and introduce a method for estimating the block-sizes needed to hide the communication on parallel file systems.
引用
收藏
页码:107749 / 107759
页数:11
相关论文
共 50 条
  • [1] Profiling Memory Vulnerability of Big-data Applications
    Rameshan, N.
    Birke, R.
    Navarro, L.
    Vlassov, V.
    Urgaonkar, B.
    Kesidis, G.
    Schmatz, M.
    Chen, L. Y.
    [J]. 2016 46TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS WORKSHOPS (DSN-W), 2016, : 258 - 261
  • [2] Online Data Deduplication for In-Memory Big-Data Analytic Systems
    Sun, Yushi
    Zeng, Catherine Y.
    Chung, Jaeyoon
    Huang, Zhe
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2017,
  • [3] Big-Data Visualization
    Keim, Daniel
    Qu, Huamin
    Ma, Kwan-Liu
    [J]. IEEE COMPUTER GRAPHICS AND APPLICATIONS, 2013, 33 (04) : 20 - 21
  • [4] Memristor: The Enabler of Computation-in-Memory Architecture for Big-Data
    Hamdioui, Said
    Taouil, Mottaqiallah
    Hoang Anh Du Nguyen
    Haron, Adib
    Xie, Lei
    Bertels, Koen
    [J]. 2015 INTERNATIONAL CONFERENCE ON MEMRISTIVE SYSTEMS (MEMRISYS), 2015,
  • [5] Neurotrauma as a big-data problem
    Huie, J. Russell
    Almeida, Carlos A.
    Ferguson, Adam R.
    [J]. CURRENT OPINION IN NEUROLOGY, 2018, 31 (06) : 702 - 708
  • [6] BigCache for Big-data Systems
    Roger, Michel Angelo
    Xu, Yiqi
    Zhao, Ming
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014, : 189 - 194
  • [7] 'Big-Data' in dermatological research
    Kaliyadan, Feroze
    Chatterjee, Kingshuk
    [J]. INDIAN JOURNAL OF DERMATOLOGY VENEREOLOGY & LEPROLOGY, 2024, 90 (03): : 342 - 344
  • [8] Lessons for big-data projects
    Birney, Ewan
    [J]. NATURE, 2012, 489 (7414) : 49 - 51
  • [9] Lessons for big-data projects
    Ewan Birney
    [J]. Nature, 2012, 489 : 49 - 51
  • [10] Out Of Memory SVD Solver for Big Data
    Haidar, Azzam
    Kabir, Khairul
    Fayad, Diana
    Tomov, Stanimire
    Dongarra, Jack
    [J]. 2017 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2017,