An Out of Memory tSVD for Big-Data Factorization

被引:2
|
作者
Carrillo-Cabada, Hector [1 ,3 ]
Skau, Erik [2 ]
Chennupati, Gopinath [2 ]
Alexandrov, Boian [1 ]
Djidjev, Hristo [2 ]
机构
[1] Los Alamos Natl Lab, Theoret Div T 1 Grp, Los Alamos, NM 87544 USA
[2] Los Alamos Natl Lab, Informat Sci CCS 3 Grp, Los Alamos, NM 87544 USA
[3] Univ New Mexico, Dept Comp Sci, Albuquerque, NM 87131 USA
关键词
Tensors; Sparse matrices; Matrix decomposition; Memory management; Security; Contracts; Singular value decomposition; tSVD; out of memory; tensor train; singular vectors; tensor networks; MATRIX; LU;
D O I
10.1109/ACCESS.2020.3000508
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Singular value decomposition (SVD) is a matrix factorization method widely used for dimension reduction, data analytics, information retrieval, and unsupervised learning. In general, only singular values of SVD are needed for most big-data applications. Methods such as tensor networks require an accurate computation of a substantial number of singular vectors, which can be accomplished through truncated SVD (tSVD). Additionally, many real-world datasets are too big to fit into the available memory, which mandates the development of out of memory algorithms that assume that most of the data resides on an external disk during the entire computation. These algorithms reduce communication to disk and hide part of the communication by overlapping it with communication on blocks of work. Here, building upon previous works on SVD for dense matrices, we present a method for computation of a predetermined number, of SVD singular vectors, and the corresponding singular values, of a matrix that cannot fit in the memory. Our out of memory tSVD can be used for tensor networks algorithms. We describe ways for reducing the communication during the computation of the left and right reflectors, needed to compute the singular vectors, and introduce a method for estimating the block-sizes needed to hide the communication on parallel file systems.
引用
收藏
页码:107749 / 107759
页数:11
相关论文
共 50 条
  • [21] Sports analytics and the big-data era
    Morgulev E.
    Azar O.H.
    Lidor R.
    [J]. International Journal of Data Science and Analytics, 2018, 5 (04) : 213 - 222
  • [22] A Middleware for Managing Big-Data Flows
    Gupta, Rajeev
    Gupta, Himanshu
    Gupta, Sanjeev
    Padmanabhan, Sriram
    [J]. WEB INFORMATION SYSTEMS ENGINEERING - WISE 2013, PT II, 2013, 8181 : 410 - 424
  • [23] Big-Data Applications in the Government Sector
    Kim, Gang-Hoon
    Trimi, Silvana
    Chung, Ji-Hyong
    [J]. COMMUNICATIONS OF THE ACM, 2014, 57 (03) : 78 - 85
  • [24] An Efficient Industrial Big-Data Engine
    Basanta-Val, Pablo
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2018, 14 (04) : 1361 - 1369
  • [25] A happy oyster is a big-data oyster
    Rutkin, Aviva
    [J]. NEW SCIENTIST, 2014, 221 (2958) : 23 - 23
  • [26] Big-Data Security Management Issues
    Paryasto, Marisa
    Alamsyah, Andry
    Rahardjo, Budi
    Kuspriyanto
    [J]. 2014 2ND INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICOICT), 2014,
  • [27] The DAQ needle in the big-data haystack
    Meschi, E.
    [J]. 21ST INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP2015), PARTS 1-9, 2015, 664
  • [28] Perspective: Sustaining the big-data ecosystem
    Philip E. Bourne
    Jon R. Lorsch
    Eric D. Green
    [J]. Nature, 2015, 527 : S16 - S17
  • [29] BIG-DATA VISUALIZATION FOR TRANSLATIONAL NEUROTRAUMA
    Nielson, Jessica
    Inoue, Tomoo
    Paquette, Jesse
    Lin, Amity
    Sacramento, Jeffrey
    Liu, Aiwen W.
    Guandique, Cristian F.
    Irvine, Karen-Amanda
    Gensel, John C.
    Beattie, Michael S.
    Bresnahan, Jacqueline C.
    Manley, Geoffrey T.
    Carlsson, Gunnar
    Lum, Pek Yee
    Ferguson, Adam R.
    [J]. JOURNAL OF NEUROTRAUMA, 2013, 30 (15) : A61 - A62
  • [30] On the Timed Analysis of Big-Data Applications
    Marconi, Francesco
    Quattrocchi, Giovanni
    Baresi, Luciano
    Bersani, Marcello M.
    Rossi, Matteo
    [J]. NASA FORMAL METHODS, NFM 2018, 2018, 10811 : 315 - 332