Analyzing multimedia data, which often comprises diverse views such as text, images, and videos, presents unique challenges for data processing. Deep matrix factorization (DMF) provides an elegant way to obtain reduced-dimensional representation of the multiview data produced by multimedia. Compared with single-layer matrix factorization, DMF can better discover the hierarchical information in a layerwise technique. However, the existing multiview DMF methods still have several problems: 1) the standard DMF using Frobenius norm fails to process data containing noises and outliers; 2) most DMF methods neglect to exploit the feature diversity to learn a more discriminative representation; and 3) in graph learning methods for DMF, the kNN method is utilized to construct data graphs, which results in many incorrect neighbor assignments. To address these issues, a robust multiview deep nonnegative matrix factorization with feature diversity and optimal graph learning (RMvDNMF-FG) is proposed for clustering in this article. Specifically, a noise-insensitive logarithmic loss function is designed to measure the factorization error, inner products of basis vectors are minimized to achieve feature diversity for obtaining discriminative representation, and an optimal graph construction strategy is proposed to maintain the geometric structure of the data. To solve the proposed model, we explore an iterative updating algorithm that makes the objective function decrease consistently as the number of iterations increases. Additionally, the convergence proof of the iterative updating algorithm is provided with detailed mathematical analysis. Furthermore, through numerous comparative experiments with eleven state-of-the-art algorithms on five multiview datasets, the effectiveness of the proposed method is demonstrated.