Let (X, Y) be a random couple in S x T with unknown distribution P. Let (X-1, Y-1),..., (X-n, Y-n) be i.i.d. copies of (X, Y). P-n being their emrical distribution. Let h(1),..., h(N) : S bar right arrow [- 1, 1] be a dictionary consisting of N functions. For lambda is an element of R-N. denote f(lambda) := Sigma(N)(j=1); lambda(j)h(j). Let l: T x R bar right aroow R be a given loss function, which is convex with respect to the second variable. Denote (l center dot f)(x, y) := l(y; f(x)). We study the following penalized empirical risk minimization problem (lambda) over cap (epsilon) := (lambda is an element of RN)argmin[P-n(l center dot f(lambda)) + epsilon parallel to lambda parallel to(p)(lp)], which is an empirical version of the problem (lambda) over cap (epsilon) :=argmin[P-n(l center dot f(lambda)) + epsilon parallel to lambda parallel to(p)(lp)] (here epsilon >= 0 is a regularization parameter; lambda(0) corresponds to epsilon = 0). A number of regression and classification problems fit this general framework. We are interested in the case when p >= 1, but it is close enough to 1 (so that p - 1 is of the order 1/log N, or smaller). We show that the "sparsity" of lambda(epsilon) implies the "sparsity" of (lambda) over cap (epsilon) and study the impact of "sparsity" on bounding the excess risk P(l center dot f((lambda) over cap epsilon)) - P(l center dot f(lambda 0)) of solutions of empirical risk minimization problems.