Pest-ConFormer: A hybrid CNN-Transformer architecture for large-scale multi-class crop pest recognition

被引:2
|
作者
Fang, Mingwei [1 ]
Tan, Zhiping [1 ]
Tang, Yu [1 ]
Chen, Weizhao [1 ]
Huang, Huasheng [1 ]
Dananjayan, Sathian [2 ]
He, Yong [3 ]
Luo, Shaoming [4 ]
机构
[1] Guangdong Polytech Normal Univ, Interdisciplinary Studies, Guangzhou, Peoples R China
[2] Vellore Inst Technol, Sch Comp Sci & Engn, Chennai, Tamilnadu, India
[3] Zhejiang Univ, Coll Biosyst Engn & Food Sci, Hangzhou, Peoples R China
[4] Foshan Univ, Sch Mechatron Engn & Automat, Foshan, Peoples R China
关键词
Crop pest classification; Transformer; Graph Convolutional Network; Fine-grained visual classification; NEURAL-NETWORK; IDENTIFICATION;
D O I
10.1016/j.eswa.2024.124833
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Crop pests are acknowledged as the major factors in reducing the yield and quality of agricultural production worldwide. It is an urgent necessity to recognize crop pests accurately to protect the crop in the early stage to reduce the loss for the agricultural economy. Due to the ecological characteristics of the crop pests and the complex background in fields, the crop pests show high inter-class similarity and significant intra-class variation in external morphology appearance, which makes current recognition methods suffer from low classification accuracy and poor generalization ability in complex natural environment recognition tasks. To tackle this problem, a hybrid convolutional neural network and transformer-based model, namely Pest-ConFormer, featured with multi-scale weakly supervised feature selection mechanisms is proposed, which has shown excellent multiscale discriminative feature extraction in fine-grained visual classification (FGVC) tasks. This method employs a hybrid convolution-transformer encoder architecture pre-training in a self-supervised masked autoencoder manner as a backbone to learn pests' highly discriminative features across various scales. Next, a dual-path feature aggregation structure with a top-down FPN-like feature pathway and a bottom-up PANet-like feature pathway based on attention mechanisms is designed to learn high-level global context information and low-level local detailed feature representation. Thirdly, a fine-grained classification module using weakly supervised learning is introduced to select the discriminative feature points in different pyramidal levels. Then, these feature points are fed into a graph convolutional network to accomplish classification. Several experiments are conducted on the large-scale multi-class IP102 benchmark dataset, and the proposed method achieves an accuracy of 77.81 % regarding crop pest recognition. The experimental results indicate that our approach outperforms other state-of-the-art methods by nearly 2 percent points, demonstrating that the proposed hybrid architecture with dual-path feature aggregation and fine-grained classification modules can be more effective in the crop pest recognition field than CNN-based methods and can be deployed in the practical natural environment. The source code will be available at https://github.com/mwfang/pestconformer.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] CTFuseNet: A Multi-Scale CNN-Transformer Feature Fused Network for Crop Type Segmentation on UAV Remote Sensing Imagery
    Xiang, Jianjian
    Liu, Jia
    Chen, Du
    Xiong, Qi
    Deng, Chongjiu
    REMOTE SENSING, 2023, 15 (04)
  • [22] Efficient large-scale multi-class image classification by learning balanced trees
    Tien-Dung Mai
    Thanh Duc Ngo
    Duy-Dinh Le
    Duc Anh Duong
    Kiem Hoang
    Satoh, Shin'ichi
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2017, 156 : 151 - 161
  • [23] Efficient feature selection for logical analysis of large-scale multi-class datasets
    Yan, Kedong
    Miao, Dongjing
    Guo, Cui
    Huang, Chanying
    JOURNAL OF COMBINATORIAL OPTIMIZATION, 2021, 42 (01) : 1 - 23
  • [24] A RANDOMIZED HEURISTIC FOR KERNEL PARAMETER SELECTION WITH LARGE-SCALE MULTI-CLASS DATA
    Hansen, Toke Jansen
    Abrahamsen, Trine Julie
    Hansen, Lars Kai
    2011 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2011,
  • [25] Efficient feature selection for logical analysis of large-scale multi-class datasets
    Kedong Yan
    Dongjing Miao
    Cui Guo
    Chanying Huang
    Journal of Combinatorial Optimization, 2021, 42 : 1 - 23
  • [26] MFH-Net: A Hybrid CNN-Transformer Network Based Multi-Scale Fusion for Medical Image Segmentation
    Wang, Ying
    Zhang, Meng
    Liang, Jian'an
    Liang, Meiyan
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2024, 34 (06)
  • [27] Multi-scale Gaussian Difference Preprocessing and Dual Stream CNN-Transformer Hybrid Network for Skin Lesion Segmentation
    Zhao, Xin
    Ren, Zhihang
    MULTIMEDIA MODELING, MMM 2023, PT II, 2023, 13834 : 671 - 682
  • [28] SCU-Counting: A large-scale benchmark dataset for multi-class object counting
    Wei, Xiang-Yi
    Zhang, Li
    Ma, Hao-Yuan
    Zhang, Xiao-Fang
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2024, 163
  • [29] Large-Scale Multi-Class Image-Based Cell Classification With Deep Learning
    Meng, Nan
    Lam, Edmund Y.
    Tsia, Kevin K.
    So, Hayden Kwok-Hay
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2019, 23 (05) : 2091 - 2098
  • [30] MSAPVT: a multi-scale attention pyramid vision transformer network for large-scale fruit recognition
    Rao, Yao
    Li, Chaofeng
    Xu, Feiran
    Guo, Ya
    JOURNAL OF FOOD MEASUREMENT AND CHARACTERIZATION, 2024, 18 (11) : 9233 - 9251