Video co-segmentation typically refers to the task to jointly segment common objects existing in a given group of videos. In practice, high-dimensional data such as videos are often conceptually thought of being drawn from a union of subspaces corresponding to multiple categories. Therefore, segmenting data into respective subspaces, known as subspace clustering, has widespread applications in computer vision, including co-segmentation. State-of-the-art methods via subspace clustering seek to solve the problem in two steps: learning an affinity matrix, followed by applying spectral clustering to the affinity matrix. However, it is insufficient to obtain an optimal solution since it does not take into account the interdependence of the affinity matrix and the segmentation. In this paper, we present a new unified video co-segmentation framework inspired by Structured Sparse Subspace Clustering ((SC)-C-3), which yields more consistent segmentation results. In order to improve the detectability of motion features with missing trajectories, we add an extra signature to motion trajectories. Moreover, we reformulate the (SC)-C-3 algorithm by adding the affine subspace constraint in order to make it more suitable to segment rigid motions lying in affine subspaces of dimension at most 3. Experiments on MOViCS dataset demonstrate the effectiveness of our approaches and robustness with heavy noise.