Pest-ConFormer: A hybrid CNN-Transformer architecture for large-scale multi-class crop pest recognition

被引:2
|
作者
Fang, Mingwei [1 ]
Tan, Zhiping [1 ]
Tang, Yu [1 ]
Chen, Weizhao [1 ]
Huang, Huasheng [1 ]
Dananjayan, Sathian [2 ]
He, Yong [3 ]
Luo, Shaoming [4 ]
机构
[1] Guangdong Polytech Normal Univ, Interdisciplinary Studies, Guangzhou, Peoples R China
[2] Vellore Inst Technol, Sch Comp Sci & Engn, Chennai, Tamilnadu, India
[3] Zhejiang Univ, Coll Biosyst Engn & Food Sci, Hangzhou, Peoples R China
[4] Foshan Univ, Sch Mechatron Engn & Automat, Foshan, Peoples R China
关键词
Crop pest classification; Transformer; Graph Convolutional Network; Fine-grained visual classification; NEURAL-NETWORK; IDENTIFICATION;
D O I
10.1016/j.eswa.2024.124833
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Crop pests are acknowledged as the major factors in reducing the yield and quality of agricultural production worldwide. It is an urgent necessity to recognize crop pests accurately to protect the crop in the early stage to reduce the loss for the agricultural economy. Due to the ecological characteristics of the crop pests and the complex background in fields, the crop pests show high inter-class similarity and significant intra-class variation in external morphology appearance, which makes current recognition methods suffer from low classification accuracy and poor generalization ability in complex natural environment recognition tasks. To tackle this problem, a hybrid convolutional neural network and transformer-based model, namely Pest-ConFormer, featured with multi-scale weakly supervised feature selection mechanisms is proposed, which has shown excellent multiscale discriminative feature extraction in fine-grained visual classification (FGVC) tasks. This method employs a hybrid convolution-transformer encoder architecture pre-training in a self-supervised masked autoencoder manner as a backbone to learn pests' highly discriminative features across various scales. Next, a dual-path feature aggregation structure with a top-down FPN-like feature pathway and a bottom-up PANet-like feature pathway based on attention mechanisms is designed to learn high-level global context information and low-level local detailed feature representation. Thirdly, a fine-grained classification module using weakly supervised learning is introduced to select the discriminative feature points in different pyramidal levels. Then, these feature points are fed into a graph convolutional network to accomplish classification. Several experiments are conducted on the large-scale multi-class IP102 benchmark dataset, and the proposed method achieves an accuracy of 77.81 % regarding crop pest recognition. The experimental results indicate that our approach outperforms other state-of-the-art methods by nearly 2 percent points, demonstrating that the proposed hybrid architecture with dual-path feature aggregation and fine-grained classification modules can be more effective in the crop pest recognition field than CNN-based methods and can be deployed in the practical natural environment. The source code will be available at https://github.com/mwfang/pestconformer.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Classifying very high-dimensional and large-scale multi-class image datasets with Latent-lSVM
    Thanh-Nghi Do
    Poulet, Francois
    2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 714 - 721
  • [42] Mix MSTAR: A Synthetic Benchmark Dataset for Multi-Class Rotation Vehicle Detection in Large-Scale SAR Images
    Liu, Zhigang
    Luo, Shengjie
    Wang, Yiting
    REMOTE SENSING, 2023, 15 (18)
  • [43] Large-Scale Semantic 3D Reconstruction: an Adaptive Multi-Resolution Model for Multi-Class Volumetric Labeling
    Blaha, Maros
    Vogel, Christoph
    Richard, Audrey
    Wegner, Jan D.
    Pock, Thomas
    Schindler, Konrad
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3176 - 3184
  • [44] Fast Target Set Reduction for Large-Scale Protein Function Prediction: A Multi-class Multi-label Machine Learning Approach
    Lingner, Thomas
    Meinicke, Peter
    ALGORITHMS IN BIOINFORMATICS, WABI 2008, 2008, 5251 : 198 - 209
  • [45] Multi-Attribute, Multi-Class, Trip-Based, Multi-Modal Traffic Network Equilibrium Model: Application to Large-Scale Network
    Ameli, Mostafa
    Lebacque, Jean-Patrick
    Leclercq, Ludovic
    TRAFFIC AND GRANULAR FLOW '17, 2019, : 487 - 495
  • [46] TSVM-M3: Twin support vector machine based on multi-order moment matching for large-scale multi-class classification
    Qiang, Wenwen
    Zhang, Hongjie
    Zhang, Jingxing
    Jing, Ling
    APPLIED SOFT COMPUTING, 2022, 128
  • [47] Optimal tuning of three deep learning methods with signal processing and anomaly detection for multi-class damage detection of a large-scale bridge
    Doroudi, Rouzbeh
    Lavassani, Seyed Hossein Hosseini
    Shahrouzi, Mohsen
    STRUCTURAL HEALTH MONITORING-AN INTERNATIONAL JOURNAL, 2024, 23 (05): : 3227 - 3252
  • [48] HA-RoadFormer: Hybrid Attention Transformer with Multi-Branch for Large-Scale High-Resolution Dense Road Segmentation
    Zhang, Zheng
    Miao, Chunle
    Liu, Changan
    Tian, Qing
    Zhou, Yongsheng
    MATHEMATICS, 2022, 10 (11)
  • [49] EEG and speech signal based multi-class recognition manoeuvre by exploiting a Hyb-SGTS and a dual stage deep CNN architecture for an early diagnosis of HC, AD and PD neurological diseases
    Balaji, Chetan
    Suresh, D. S.
    INTERNATIONAL JOURNAL OF BIOMEDICAL ENGINEERING AND TECHNOLOGY, 2024, 44 (04) : 348 - 366
  • [50] PatchOut: A novel patch-free approach based on a transformer-CNN hybrid framework for fine-grained land-cover classification on large-scale airborne hyperspectral images
    Ji, Renjie
    Tan, Kun
    Wang, Xue
    Tang, Shuwei
    Sun, Jin
    Niu, Chao
    Pan, Chen
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2025, 138