Object detection is a fundamental task in the analysis and interpretation of remote sensing images. However, compared with natural images, remote sensing images are characterized by broad diversity in object scales, fuzzy objects, and complex background, which bring great challenges to object detection. For overcoming the above problems, a task alignment interaction and cross-scale guidance enhancement network (TCNet) is proposed in this letter. First, a generalized mean spatial pyramid pooling (GeMSPP) is designed and embedded in the backbone to adapt to the changes in complex environment and reduce loss of features. Second, cross-scale guided enhancement network (CGEN) is proposed to generate high-quality nonaliasing multiscale target features for each feature level by guiding the fusion of deep features and enhancing feature expression. Third, task alignment interactive head (TAIH) is adopted to enhance the classification and regression accuracy of the prediction box, so as to suppress background interference and highlight object features. Experiments conducted on public DIOR and RSOD datasets illustrate that the proposed modules can effectively improve the accuracy of detection and our network has superior performance compared with other state-of-the-art (SOTA) detectors.