Structural similarity (SSIM)-based distortion Dssim is more consistent with human perception than the traditional mean squared error D-MSE. To achieve better video encoding quality, many studies on optimal bit allocation (OBA) used Dssim as the distortion metric. However, the MSE-based rate distortion optimization (RDO) was still used in these studies. The inconsistency between the optimization goals of OBA and RDO results in a non-optimal SSIM-based encoding performance. To solve this problem, we propose an accurate coding tree unit level D-SSIM-D-MSE model, which enables performing the SSIM-based RDO with simpler R-D-MSE cost scaled by the SSIM-based Lagrangian parameter lambda(SSIM). Moreover, based on this model, the R-D-SSIM model can be accurately estimated based on the joint relationship of R-D-SSIM-lambda(SSIM) With the accurate R-D-SSIM model, the SSIM-based OBA problem is then solved. Accordingly, the SSIM-based OBA and SSIM-based RDO are unified together in our scheme, called SOSR. Compared with the HEVC reference encoder HM16.20, SOSR saves 5%, 11%, and 17% bitrate under the same SSIM in the commonly used all-intra, hierarchical and non-hierarchical low-delay-B configurations, which is superior to existing state-of-the-art SSIM-based OBA schemes.