Language researchers frequently use indices of lexical diversity as developmental measures in studies of, for example, first and second language acquisition, deafness, Down's syndrome and language impairment, linguistic input to children, in clinical practice with the language delayed, and in forensic linguistics. The most common approach has been to divide the number of Types by the number of Tokens to produce the Type-Token Ratio (TTR). The TTR, and the measures derived from it (Root-TTR, Log-TTR, for example), are flawed, however, being dependent on the size of the sample of Tokens used. Other procedures such as Mean Segmental TTR neither fully exploit the data nor provide a universal bast: for comparison. Here, the problems with standard measures and the ways in which their use can lead to anomalous results are demonstrated. Mathematical models of the relationship between TTR and Token size are shown to be a basis for producing a valid measure of lexical diversity which is independent of sample size and can be made more readily accessible to researchers and clinicians than previous mathematical approaches.