This study used a mixed methods methodology to investigate the reliability and validity of the Ounce Scale, an authentic, observational assessment of infants' and toddlers' development from birth through 42 months of age. Quantitative cross-sectional data were collected from 287 children and 124 teachers in seven urban Early Head Start programs; qualitative data were derived from interviews with 21 teachers and seven supervisors. Data were collected across eight age groups. Results showed moderate reliability of the Ounce Scale and provided evidence of agreement with criterion measures for concurrent validity. Receiver operating characteristic curve (ROC) analyses demonstrated very good levels of accuracy in predicting which children were at-risk or not at-risk. Hierarchical regression analyses indicated that, after controlling for child and family variables, the Ounce Scale contributed significantly to explaining the variance in children's performance on the criterion measures. Analysis of qualitative interview data elaborates on these findings in terms of the strength-based philosophy of the caregivers, the binary structure of the scale, the cultural context in which the scale was used, and the need for additional professional development. Discussion also centers on the relationship between norm-referenced and performance-based assessments in early childhood.