Maize tassels play a very important role in the process of maize growth. It is a high demand to realize the accurate identification and counting of maize tassels in the complex field environment. In this study, a complete detection and counting system was established for the farmland maize tassels using Unmanned Aerial Vehicle (UAV) remote sensing and computer vision, in order to promote the application of intelligent agriculture during maize production. The UAV images were also collected during the maize heading stage in the experimental field. Three target detection networks of Faster R-CNN, SSD, and YOLO_X were selected to realize the high-precision recognition of maize tassels using transfer learning. Specifically, the UAV was firstly utilized to collect the RGB images of maize tassels with a height of 10 m on August 9, 2021. Secondly, the UAV images of maize tassels were cut into 600 × 600 pixels. The same number of samples were then selected for the training set, verification set, and test set, according to each variety and planting density. Finally, the weight of training on the public dataset was transferred to the target model using transfer learning. The recognition performance of maize tassel was compared before and after transfer learning. The experimental results show that the average precision, the recall rate, and the accuracy rate of Faster R-CNN target detection networks increased by 16.41, 21.86, and 10.01 percentage points, respectively, compared with the SSD, and YOLO_X. By contrast, the average precision, recall rate, and accuracy rate of the SSD increased by 3.05, 1.76 percentage points, respectively. The average precision, the recall rate, and the accuracy rate of YOLO_X increased by 3.56, 4.51 percentage points, respectively. Among them, the recognition precision, average precision, accuracy, and LAMR of YOLO_X after transfer learning reached 97.16%, 93.60%, 99.84%, and 0.22, respectively, compared with the Faster R-CNN and SSD networks. The best performance was achieved for the detection of maize tassel. In addition, the Faster R-CNN, SSD, and YOLO_X were also utilized to determine the adaptability of the model under the five varieties of maize tassels. The results showed that the maize tassels of Zhengdan958 were easier to be tested, indicating the best adaptability to the model. Nevertheless, there was a low correlation between the true and prediction on the number of frames of Jingjiuqingzhu16 maize tassels, indicating the low detection performance. The training datasets of this variety were then suggested to be expanded and suitable for the model in the future. In addition, five varieties were also tested at four planting densities using the YOLO_X model after transfer learning. The experimental results show that the detection error of the model for the maize tassel significantly increased with the increase in planting density. The density of maize tassel was also estimated to effectively obtain the agronomic phenotype of maize for the prediction of maize yield. A systematic investigation was made to clarify the influence of the difference between varieties and planting density on the model detection. Many factors were determined for the model detection, such as the plant type of maize, the parameters of the model, and the feature extraction network. Therefore, the finding can also provide strong support for the intelligent production of maize and agricultural modernization. © 2022 Chinese Society of Agricultural Engineering. All rights reserved.