Cross-media semantics understanding, which focuses on multimedia data of different modalities, is a rising hot topic in social media analysis. One of the most challenging issues for cross-media semantics understanding is how to represent multimedia data of different modalities. Most traditional multimedia semantics analysis works are based on single modality data sources, such as Flickr images or YouTube videos, leaving efficient cross-media data representation wide open. In this paper, we propose a novel nonnegative cross-media recoding approach, which learns co-occurrences of cross-media feature spaces by explicitly learning a common subset of basis vectors. Moreover, we impose the nonnegativity constraint on the decomposed matrices so that the basis vectors represent important and locally meaningful features of the cross-media data. We take two kinds of typical multimedia data, that is, image and audio, as experimental data. Our approach can be applied to a wide range of multimedia applications. The experiments are conducted on image-audio dataset for applications of cross-media retrieval and data clustering. Experiment results are encouraging and show that the performance of our approach is effective.