A theoretical analysis of the RC4 Key Scheduling Algorithm (KSA) is presented in this paper, where the nonlinear operation is swapping among the permutation bytes. Explicit formulae are provided for the probabilities with which the permutation bytes at any stage of the KSA are biased to the secret key. Theoretical proofs of these formulae have been left open since Roos' work (1995). Next, a generalization of the RC4 KSA is analyzed corresponding to a class of update functions of the indices involved in the swaps. This reveals an inherent weakness of shuffle-exchange kind of key scheduling. Moreover, we show that biases towards the secret key also exist in S[S[y]], S[S[S[y]]], and so on, for initial values of y. We additionally show that each byte of S-N actually reveals secret key information. Looking at all the elements of the final permutation S-N and its inverse S-N(-1), the value of the hidden index j in each round of the KSA can be estimated from a "pair of values" in 0, ... , N - 1 with a constant probability of success pi = N-2/N . (N-1/N)(N-1) + 2/N (we get pi approximate to 0.37, for N = 256), which is significantly higher than the random association. Using the values of two consecutive j's, we estimate the y-th key byte from at most a "quadruple of values" in 0, ... , N - 1 with a probability > 0.12. As a secret key of l bytes is repeated at least [N/l] times in RC4, these many quadruples can be accumulated to get each byte of the secret key with very high probability (e. g., 0.8 to close to 1) from a small set of values. Based on our analysis of the key scheduling, we show that the secret key of RC4 can be recovered from the state information in a time much less than the exhaustive search with good probability. Finally, based on the above biases of the permutation after the KSA and other related results, a complete framework is presented to show that many keystream output bytes of RC4 are significantly biased towards several linear combinations of the secret key bytes. The results do not assume any condition on the secret key. We find new biases in the initial as well as in the 256-th and 257-th keystream output bytes.