Guess, Roberts, Behrens, and Rues (1998) presented reliability data from recordings of behavior state using a 13-category coding system. Interobserver agreement was reported at 63% to 91% across categories. in an attempt at replication, we found lower levels of reliability (0% to 80%). To determine the reasons for different results, we obtained measurements of behavior states from video-recordings by five of Guess et al.'s observers. Again, replication was unsuccessful. Obtained mean percentage agreement on occurrence for individual behavior states and participants ranged across observer pairs from 0% to 58% (kappa range was 0 to .64). Some possible reasons for failures to replicate are discussed.