Theorists have found it difficult to reconcile the unity of inner speech as a mental state kind with the diversity of its manifestations. I argue that existing views concerning the content of inner speech fail to accommodate both of these features because they mistakenly assume that its content is to be found in the 'speech processing hierarchy', which includes semantic, syntactic, phonemic, phonetic, and articulatory levels. Upon rejecting this assumption, I offer a position on which the content of inner speech is determined by voice processing, of which speech processing is but one component. The resulting view does justice to the idea that inner speech is a motley assortment of episodes that nevertheless form a kind.