In the subspace sketch problem one is given an n x d matrix A with O(log(nd)) bit entries, and would like to compress it in an arbitrary way to build a small space data structure Q(p), so that for any given x is an element of R-d, with probability at least 2/3, one has Q(p)(x) = (1 +/- epsilon)parallel to Ax parallel to(p), where p >= 0 and the randomness is over the construction of Q(p). The central question is: How many bits are necessary to store Q(p)? This problem has applications to the communication of approximating the number of non-zeros in a matrix product, the size of coresets in projective clustering, the memory of streaming algorithms for regression in the row-update model, and embedding subspaces of L-p in functional analysis. A major open question is the dependence on the approximation factor epsilon. We show if p >= 0 is not a positive even integer and d = Omega(log(1/epsilon)), then (Omega) over tilde(epsilon(-2) . d) bits are necessary. On the other hand, if p is a positive even integer, then there is an upper bound of O(d(p) log(nd)) bits independent of epsilon. Our results are optimal up to logarithmic factors, and show in particular that one cannot compress A to O(d) "directions" v(1), ... , v(O(d)), such that for any x, parallel to Ax parallel to(1) can be well-approximated from < v(1), x(i)>, ... , < v(O(d)), x(i)>. Our lower bound rules out arbitrary functions of these inner products (and in fact arbitrary data structures built from A), and thus rules out the possibility of a singular value decomposition for l(1) in a very strong sense. Indeed, as epsilon -> 0, for p = 1 the space complexity becomes arbitrarily large, while for p = 2 it is at most O (d(2) log(nd)). As corollaries of our main lower bound, we obtain new lower bounds for a wide range of applications, including the above, which in many cases are optimal.