Weakly supervised object detection (WSOD) in remote sensing images (RSIs) is used to detect high-value objects by utilizing image-level labels. However, the current models still have two problems. First, the misclassification of neighboring instances has easily occurred because the one-hot label is assigned to all of the seed instances and their neighboring instances. Second, the supervisory information of each instance classifier refinement (ICR) branch is generated from the predicted class score of the upper ICR branch rather than the real label; thus, the prediction mistake of each ICR branch will be accumulated with the propagation of supervisory information. To address the first problem, a complete definition of the pseudosoft label (CPSL) of instances is proposed to directly train each ICR branch, where the CPSL of seed instances is defined according to the predicted class scores of upper ICR branch, and the CPSL of other instances is determined by the spatial distance weighted feature similarity (FS) between them and seed instances. To handle the second problem, an invariant multiple instance learning (IMIL) scheme is proposed to indirectly train each ICR branch by using real image-level labels. Furthermore, the affine transformations of the original image are incorporated into the baseline model to enhance the invariance of our model. The ablation studies verify the effectiveness of CPSL, IMIL, and their combination. The quantitative comparisons with popular methods show that 73.63% (31.08%) mean average precision (mAP) and 79.88% (57.52%) correct localization (CorLoc) of our method are the best on the NWPU VHR-10.v2 (DIOR) dataset, and the qualitative comparisons intuitively demonstrate it again.