Multi-Modal Deep Learning for Weeds Detection in Wheat Field Based on RGB-D Images

被引:11
|
作者
Xu, Ke [1 ,2 ,3 ,4 ,5 ]
Zhu, Yan [1 ,2 ,3 ,4 ,5 ]
Cao, Weixing [1 ,2 ,3 ,4 ,5 ]
Jiang, Xiaoping [1 ,2 ,3 ,4 ,5 ]
Jiang, Zhijian [6 ]
Li, Shuailong [6 ]
Ni, Jun [1 ,2 ,3 ,4 ,5 ]
机构
[1] Nanjing Agr Univ, Coll Agr, Nanjing, Peoples R China
[2] Natl Engn & Technol Ctr Informat Agr, Nanjing, Peoples R China
[3] Minist Educ, Engn Res Ctr Smart Agr, Nanjing, Peoples R China
[4] Jiangsu Key Lab Informat Agr, Nanjing, Peoples R China
[5] Jiangsu Collaborat Innovat Ctr Technol & Applicat, Nanjing, Peoples R China
[6] Nanjing Agr Univ, Coll Artificial Intelligence, Nanjing, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
weeds detection; RGB-D image; multi-modal deep learning; machine learning; three-channel network; CROP; VISION; GROWTH; IMPACT; YIELD;
D O I
10.3389/fpls.2021.732968
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
Single-modal images carry limited information for features representation, and RGB images fail to detect grass weeds in wheat fields because of their similarity to wheat in shape. We propose a framework based on multi-modal information fusion for accurate detection of weeds in wheat fields in a natural environment, overcoming the limitation of single modality in weeds detection. Firstly, we recode the single-channel depth image into a new three-channel image like the structure of RGB image, which is suitable for feature extraction of convolutional neural network (CNN). Secondly, the multi-scale object detection is realized by fusing the feature maps output by different convolutional layers. The three-channel network structure is designed to take into account the independence of RGB and depth information, respectively, and the complementarity of multi-modal information, and the integrated learning is carried out by weight allocation at the decision level to realize the effective fusion of multi-modal information. The experimental results show that compared with the weed detection method based on RGB image, the accuracy of our method is significantly improved. Experiments with integrated learning shows that mean average precision (mAP) of 36.1% for grass weeds and 42.9% for broad-leaf weeds, and the overall detection precision, as indicated by intersection over ground truth (IoG), is 89.3%, with weights of RGB and depth images at alpha = 0.4 and beta = 0.3. The results suggest that our methods can accurately detect the dominant species of weeds in wheat fields, and that multi-modal fusion can effectively improve object detection performance.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Multi-modal deep feature learning for RGB-D object detection
    Xu, Xiangyang
    Li, Yuncheng
    Wu, Gangshan
    Luo, Jiebo
    PATTERN RECOGNITION, 2017, 72 : 300 - 313
  • [2] RGB-D BASED MULTI-MODAL DEEP LEARNING FOR FACE IDENTIFICATION
    Lin, Tzu-Ying
    Chiu, Ching-Te
    Tang, Ching-Tung
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 1668 - 1672
  • [3] RGB-D based multi-modal deep learning for spacecraft and debris recognition
    AlDahoul, Nouar
    Karim, Hezerul Abdul
    Momo, Mhd Adel
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [4] RGB-D based multi-modal deep learning for spacecraft and debris recognition
    Nouar AlDahoul
    Hezerul Abdul Karim
    Mhd Adel Momo
    Scientific Reports, 12
  • [5] Multi-modal deep learning networks for RGB-D pavement waste detection and recognition
    Li, Yangke
    Zhang, Xinman
    WASTE MANAGEMENT, 2024, 177 : 125 - 134
  • [6] computer catwalk: A multi-modal deep network for the segmentation of RGB-D images of clothes
    Joukovsky, B.
    Hu, P.
    Munteanu, A.
    Electronics Letters, 2020, 56 (09):
  • [7] Multi-modal deep network for RGB-D segmentation of clothes
    Joukovsky, B.
    Hu, P.
    Munteanu, A.
    ELECTRONICS LETTERS, 2020, 56 (09) : 432 - 434
  • [8] Multi-modal deep learning for Fuji apple detection using RGB-D cameras and their radiometric capabilities
    Gene-Mola, Jordi
    Vilaplana, Veronica
    Rosell-Polo, Joan R.
    Morros, Josep-Ramon
    Ruiz-Hidalgo, Javier
    Gregorio, Eduard
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2019, 162 : 689 - 698
  • [9] MULTI-MODAL TRANSFORMER FOR RGB-D SALIENT OBJECT DETECTION
    Song, Peipei
    Zhang, Jing
    Koniusz, Piotr
    Barnes, Nick
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2466 - 2470
  • [10] Multi-modal uniform deep learning for RGB-D person re-identification
    Ren, Liangliang
    Lu, Jiwen
    Feng, Jianjiang
    Zhou, Jie
    PATTERN RECOGNITION, 2017, 72 : 446 - 457