Multimodal fusion object detection aims to improve detection accuracy by integrating information from multiple modalities. RGB image-based object detection makes finely sorting solid waste according to material challenging. This article built a dual-camera acquisition platform using a line-scan color camera and a hyperspectral camera to collect RGB and hyperspectral images. In order to use RGB images and hyperspectral images for feature fusion more effectively, we propose an asymmetric multiscale feature fusion network (AMFFNet) based on RGB-near-infrared (NIR) multisensor fusion technology. Specifically, we designed a hyperspectral image convolution unit (HICU) to fully extract multiscale features from hyperspectral images. Second, we concatenate the hyperspectral feature maps and the feature maps output by feature pyramid networks (FPNs) of the RGB image feature extraction stage to achieve asymmetric multiscale feature fusion. In addition, a dimensionality reduction strategy (DRS) is proposed to remove the bands that are redundant and have a low signal-to-noise ratio in hyperspectral images. Ablation studies have confirmed the effectiveness of AMFFNet components, and we have conducted extensive integration experiments on the article-based solid waste dataset. The experimental results and analysis show that AMFFNet based on mask region-based convolutional neural network (Mask RCNN), faster RCNN, and RetinaNet detectors outperforms the original model by 5.05%, 3.57%, and 6.36% on AP at IoU = 0.5 and by 3.1%, 2.05%, and 3.73% on COCO's standard AP metric, and the parameters of the network increased by 0.47-0.56 M. In conclusion, the proposed method effectively improves the performance of the object detection model for fine identification of solid waste.