Pest-ConFormer: A hybrid CNN-Transformer architecture for large-scale multi-class crop pest recognition

被引：2

作者：

Fang, Mingwei ^{[1
]}

Tan, Zhiping ^{[1
]}

Tang, Yu ^{[1
]}

Chen, Weizhao ^{[1
]}

Huang, Huasheng ^{[1
]}

Dananjayan, Sathian ^{[2
]}

He, Yong ^{[3
]}

Luo, Shaoming ^{[4
]}

机构：

[1] Guangdong Polytech Normal Univ, Interdisciplinary Studies, Guangzhou, Peoples R China

[2] Vellore Inst Technol, Sch Comp Sci & Engn, Chennai, Tamilnadu, India

[3] Zhejiang Univ, Coll Biosyst Engn & Food Sci, Hangzhou, Peoples R China

[4] Foshan Univ, Sch Mechatron Engn & Automat, Foshan, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 255卷

关键词：

Crop pest classification; Transformer; Graph Convolutional Network; Fine-grained visual classification; NEURAL-NETWORK; IDENTIFICATION;

D O I：

10.1016/j.eswa.2024.124833

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Crop pests are acknowledged as the major factors in reducing the yield and quality of agricultural production worldwide. It is an urgent necessity to recognize crop pests accurately to protect the crop in the early stage to reduce the loss for the agricultural economy. Due to the ecological characteristics of the crop pests and the complex background in fields, the crop pests show high inter-class similarity and significant intra-class variation in external morphology appearance, which makes current recognition methods suffer from low classification accuracy and poor generalization ability in complex natural environment recognition tasks. To tackle this problem, a hybrid convolutional neural network and transformer-based model, namely Pest-ConFormer, featured with multi-scale weakly supervised feature selection mechanisms is proposed, which has shown excellent multiscale discriminative feature extraction in fine-grained visual classification (FGVC) tasks. This method employs a hybrid convolution-transformer encoder architecture pre-training in a self-supervised masked autoencoder manner as a backbone to learn pests' highly discriminative features across various scales. Next, a dual-path feature aggregation structure with a top-down FPN-like feature pathway and a bottom-up PANet-like feature pathway based on attention mechanisms is designed to learn high-level global context information and low-level local detailed feature representation. Thirdly, a fine-grained classification module using weakly supervised learning is introduced to select the discriminative feature points in different pyramidal levels. Then, these feature points are fed into a graph convolutional network to accomplish classification. Several experiments are conducted on the large-scale multi-class IP102 benchmark dataset, and the proposed method achieves an accuracy of 77.81 % regarding crop pest recognition. The experimental results indicate that our approach outperforms other state-of-the-art methods by nearly 2 percent points, demonstrating that the proposed hybrid architecture with dual-path feature aggregation and fine-grained classification modules can be more effective in the crop pest recognition field than CNN-based methods and can be deployed in the practical natural environment. The source code will be available at https://github.com/mwfang/pestconformer.

引用

页数：15

共 50 条

[41] Classifying very high-dimensional and large-scale multi-class image datasets with Latent-lSVM
Thanh-Nghi Do
Poulet, Francois
2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 714 - 721
[42] Mix MSTAR: A Synthetic Benchmark Dataset for Multi-Class Rotation Vehicle Detection in Large-Scale SAR Images
Liu, Zhigang
Luo, Shengjie
Wang, Yiting
REMOTE SENSING, 2023, 15 (18)
[43] Large-Scale Semantic 3D Reconstruction: an Adaptive Multi-Resolution Model for Multi-Class Volumetric Labeling
Blaha, Maros
Vogel, Christoph
Richard, Audrey
Wegner, Jan D.
Pock, Thomas
Schindler, Konrad
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3176 - 3184
[44] Fast Target Set Reduction for Large-Scale Protein Function Prediction: A Multi-class Multi-label Machine Learning Approach
Lingner, Thomas
Meinicke, Peter
ALGORITHMS IN BIOINFORMATICS, WABI 2008, 2008, 5251 : 198 - 209
[45] Multi-Attribute, Multi-Class, Trip-Based, Multi-Modal Traffic Network Equilibrium Model: Application to Large-Scale Network
Ameli, Mostafa
Lebacque, Jean-Patrick
Leclercq, Ludovic
TRAFFIC AND GRANULAR FLOW '17, 2019, : 487 - 495
[46] TSVM-M3: Twin support vector machine based on multi-order moment matching for large-scale multi-class classification
Qiang, Wenwen
Zhang, Hongjie
Zhang, Jingxing
Jing, Ling
APPLIED SOFT COMPUTING, 2022, 128
[47] Optimal tuning of three deep learning methods with signal processing and anomaly detection for multi-class damage detection of a large-scale bridge
Doroudi, Rouzbeh
Lavassani, Seyed Hossein Hosseini
Shahrouzi, Mohsen
STRUCTURAL HEALTH MONITORING-AN INTERNATIONAL JOURNAL, 2024, 23 (05): : 3227 - 3252
[48] HA-RoadFormer: Hybrid Attention Transformer with Multi-Branch for Large-Scale High-Resolution Dense Road Segmentation
Zhang, Zheng
Miao, Chunle
Liu, Changan
Tian, Qing
Zhou, Yongsheng
MATHEMATICS, 2022, 10 (11)
[49] EEG and speech signal based multi-class recognition manoeuvre by exploiting a Hyb-SGTS and a dual stage deep CNN architecture for an early diagnosis of HC, AD and PD neurological diseases
Balaji, Chetan
Suresh, D. S.
INTERNATIONAL JOURNAL OF BIOMEDICAL ENGINEERING AND TECHNOLOGY, 2024, 44 (04) : 348 - 366
[50] PatchOut: A novel patch-free approach based on a transformer-CNN hybrid framework for fine-grained land-cover classification on large-scale airborne hyperspectral images
Ji, Renjie
Tan, Kun
Wang, Xue
Tang, Shuwei
Sun, Jin
Niu, Chao
Pan, Chen
INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2025, 138

← 1 2 3 4 5 →