A Vegetable Leaf Disease Identification Model Based on Image-Text Cross-Modal Feature Fusion

被引:3
|
作者
Feng, Xuguang [1 ,2 ,3 ,4 ]
Zhao, Chunjiang [2 ,3 ]
Wang, Chunshan [1 ,2 ,3 ,4 ]
Wu, Huarui [2 ,3 ]
Miao, Yisheng [2 ,3 ]
Zhang, Jingjian [5 ]
机构
[1] Hebei Agr Univ, Sch Informat Sci & Technol, Baoding, Peoples R China
[2] Natl Engn Res Ctr Informat Technol Agr, Beijing, Peoples R China
[3] Minist Agr & Rural Affairs Peoples Republ China, Agr Key Lab Digital Village, Beijing, Peoples R China
[4] Hebei Key Lab Agr Big Data, Baoding, Peoples R China
[5] Cangzhou Acad Agr & Forestry Sci, Cangzhou, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
cross-modal fusion; transformer; few-shot; complex background; disease identification;
D O I
10.3389/fpls.2022.918940
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
In view of the differences in appearance and the complex backgrounds of crop diseases, automatic identification of field diseases is an extremely challenging topic in smart agriculture. To address this challenge, a popular approach is to design a Deep Convolutional Neural Network (DCNN) model that extracts visual disease features in the images and then identifies the diseases based on the extracted features. This approach performs well under simple background conditions, but has low accuracy and poor robustness under complex backgrounds. In this paper, an end-to-end disease identification model composed of a disease-spot region detector and a disease classifier (YOLOv5s + BiCMT) was proposed. Specifically, the YOLOv5s network was used to detect the disease-spot regions so as to provide a regional attention mechanism to facilitate the disease identification task of the classifier. For the classifier, a Bidirectional Cross-Modal Transformer (BiCMT) model combining the image and text modal information was constructed, which utilizes the correlation and complementarity between the features of the two modalities to achieve the fusion and recognition of disease features. Meanwhile, the problem of inconsistent lengths among different modal data sequences was solved. Eventually, the YOLOv5s + BiCMT model achieved the optimal results on a small dataset. Its Accuracy, Precision, Sensitivity, and Specificity reached 99.23, 97.37, 97.54, and 99.54%, respectively. This paper proves that the bidirectional cross-modal feature fusion by combining disease images and texts is an effective method to identify vegetable diseases in field environments.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] An Enhanced Feature Extraction Framework for Cross-Modal Image-Text Retrieval
    Zhang, Jinzhi
    Wang, Luyao
    Zheng, Fuzhong
    Wang, Xu
    Zhang, Haisu
    [J]. REMOTE SENSING, 2024, 16 (12)
  • [2] Heterogeneous Graph Fusion Network for cross-modal image-text retrieval
    Qin, Xueyang
    Li, Lishuang
    Pang, Guangyao
    Hao, Fei
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [3] Cross-modal Image-Text Retrieval with Multitask Learning
    Luo, Junyu
    Shen, Ying
    Ao, Xiang
    Zhao, Zhou
    Yang, Min
    [J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2309 - 2312
  • [4] Rethinking Benchmarks for Cross-modal Image-text Retrieval
    Chen, Weijing
    Yao, Linli
    Jin, Qin
    [J]. PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 1241 - 1251
  • [5] Adaptive Cross-Modal Embeddings for Image-Text Alignment
    Wehrmann, Pinatas
    Kolling, Camila
    Barros, Rodrigo C.
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12313 - 12320
  • [6] Cross-Modal Image-Text Retrieval with Semantic Consistency
    Chen, Hui
    Ding, Guiguang
    Lin, Zijin
    Zhao, Sicheng
    Han, Jungong
    [J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1749 - 1757
  • [7] Image-text bidirectional learning network based cross-modal retrieval
    Li, Zhuoyi
    Lu, Huibin
    Fu, Hao
    Gu, Guanghua
    [J]. NEUROCOMPUTING, 2022, 483 : 148 - 159
  • [8] Image-Text Cross-Modal Retrieval via Modality-Specific Feature Learning
    Wang, Jian
    He, Yonghao
    Kang, Cuicui
    Xiang, Shiming
    Pan, Chunhong
    [J]. ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2015, : 347 - 354
  • [9] A TRANSFORMER-BASED CROSS-MODAL IMAGE-TEXT RETRIEVAL METHOD USING FEATURE DECOUPLING AND RECONSTRUCTION
    Zhang, Huan
    Sun, Yingzhi
    Liao, Yu
    Xu, SiYuan
    Yang, Rui
    Wang, Shuang
    Hou, Biao
    Jiao, Licheng
    [J]. 2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 1796 - 1799
  • [10] Image-Text Embedding with Hierarchical Knowledge for Cross-Modal Retrieval
    Seo, Sanghyun
    Kim, Juntae
    [J]. PROCEEDINGS OF 2018 THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2018) / 2018 THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND MULTIMEDIA TECHNOLOGY (ICIMT 2018), 2018, : 350 - 353