Cancer subtype identification is crucial for understanding tumor heterogeneity. Existing methods for identifying cancer subtypes have primarily focused on utilizing traditional clustering algorithms (such as k-means and hierarchical clustering) to cluster gene expression data and thus to identify subtypes. These traditional approaches, however, separately group the data from genes or samples dimension only, so they cannot discover the patterns that similar genes exhibit similar behaviors only over a subset of conditions (or samples). Bi-clustering can simultaneously group large scale gene expression data from sample and gene dimensions, and find out bi-clusters that relevant samples exhibit similar gene expression profiles over a subset of genes, and thus to identify corresponding cancer subtypes. The discovered bi-clusters bring insights for categorizing cancer subtypes and precise gene treatments. Incorporating the information of gene-gene interaction networks can further improve the quality of the discovered bi-clusters. However, current efforts generally use the networks to weight and select genes. They are often interfered by noisy interactions and misled by missing interactions. There are many types of bi-clusters, including constant bi-cluster, constant row bi-cluster, constant column bi-cluster, coherent values additive bi-cluster and coherent value multiplicative bi-cluster. To address these limitations and explore multiple types of bi-clusters, in this paper, we introduce a gene-gene interaction Network Regularized Bi-Clustering algorithm (NetRBC) based on the Semi-Nonnegative Matrix Tri-Factorization (SNMTF). NetRBC firstly integrates the mean square residuals into SNMFT, and optimizes the gene-cluster and sample-cluster indicator matrices via minimizing the sum-squared loss of the discovered bi-clusters. Next, it constructs a graph regularization term by using the gene networks and gene-cluster indicator matrix. The core idea of the regularization term is that if a pair of genes interact with each other, these genes may co-regulate the production of one cancer subtype, so we except that these genes can be grouped into the same bi-clusters. After that, NetRBC incorporates the regularization term into a sum-squared loss based SNMTF to guide the collaborative factorization and thus to pursue gene-cluster indicator matrix and sample-cluster indicator matrix, and thus to improve the accuracy of cancer subtypes categorization. At the same time, NetRBC uses a regularization parameter to control the contribution of gene-gene interaction network. We also give an optimization technique to optimize the gene-cluster and sample-cluster indicator matrices, which uses the multiplicative updating technique to alternatively optimize one variable, while fixing the other variables, until convergence. We conduct experiments on six cancer gene expression datasets with known subtypes to comparatively study the performance of NetRBC. We further test NetRBC on two large-scale cancer gene expression datasets from The Cancer Genome Atlas (TCGA) project and use the clinical features of patients to evaluate the performance, since the true subtypes of these samples belonging to are unknown. Extensive experimental results show that NetRBC can better group patients into subtypes than competitive comparing methods, and the proposed network regularization term indeed significantly improves the cancer subtype categorization accuracy. © 2019, Science Press. All right reserved.