This paper gives an introduction to the connection between predictability and information theory, and derives new connections between these concepts. A system is said to be unpredictable if the forecast distribution, which gives the most complete description of the future state based on all available knowledge, is identical to the climatological distribution, which describes the state in the absence of time lag information. It follows that a necessary condition for predictability is for the forecast and climatological distributions to differ. Information theory provides a powerful framework for quantifying the difference between two distributions that agrees with intuition about predictability. Three information theoretic measures have been proposed in the literature: predictive information, relative entropy, and mutual information. These metrics are discussed with the aim of clarifying their similarities and differences. All three metrics have attractive properties for defining predictability, including the fact that they are invariant with respect to nonsingular linear transformations, decrease monotonically in stationary Markov systems in some sense, and are easily decomposed into components that optimize them ( in certain cases). Relative entropy and predictive information have the same average value, which in turn equals the mutual information. Optimization of mutual information leads naturally to canonical correlation analysis, when the variables are joint normally distributed. Closed form expressions of these metrics for finite dimensional, stationary, Gaussian, Markov systems are derived. Relative entropy and predictive information differ most significantly in that the former depends on the "signal to noise ratio'' of a single forecast distribution, whereas the latter does not. Part II of this paper discusses the extension of these concepts to imperfect forecast models.