Consider a random sample on variables X1,..., X(v) with some values of X(v) missing. Selection models specify the distribution of X1, ..., X(v) over respondents and nonrespondents to X(v), and the conditional distribution that X(v) is missing given X1,...,X(v). In contrast, pattern-mixture models specify the conditional distribution of X1,...,X(v) given that X(v) is observed or missing respectively and the marginal distribution of the binary indicator for whether or not X(v) is missing. For multivariate data with a general pattern of missing values, the literature has tended to adopt the selection-modeling approach (see for example Little and Rubin); here, pattern-mixture models are proposed for this more general problem. Pattern-mixture models are chronically underidentified; in particular for the case of univariate nonresponse mentioned above, there are no data on the distribution of X(v) given X1,...,X(v-1) in the stratum with X(v) missing. Thus the models require restrictions or prior information to identify the parameters. Complete-case restrictions tie unidentified parameters to their (identified) analogs in the stratum of complete cases. Alternative types of restriction tie unidentified parameters to parameters in other missing-value patterns or sets of such patterns. This large set of possible identifying restrictions yields a rich class of missing-data models. Unlike ignorable selection models, which generally requires iterative methods except for special missing-data patterns, some pattern-mixture models yield explicit ML estimates for general patterns. Such models are readily amenable to Bayesian methods and form a convenient basis for multiple imputation. Some previously considered noniterative estimation methods are shown to be maximum likelihood (ML) under a pattern-mixture model. For example, Buck's method for continuous data, corrected as in Beale and Little (1975), and Brown's estimators for nonrandomly missing data are ML for pattern-mixture models with particular complete-case restrictions. Available-case analyses, where the mean and variance of X(j) are computed using all cases with X(j) observed and the correlation (or covariance) of X(j) and X(k) is computed using all cases with X(j) and X(k) observed, are also close to ML for another pattern-mixture model. Asymptotic theory for this class of estimators is outlined.