A Framework for Agricultural Intelligent Analysis Based on a Visual Language Large Model

被引:1
|
作者
Yu, Piaofang [1 ,2 ]
Lin, Bo [1 ,2 ]
机构
[1] Zhejiang Univ, Sch Software Technol, Ningbo 315048, Peoples R China
[2] Zhejiang Univ, Binjiang Inst, Innovat Ctr Informat, Hangzhou 310053, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 18期
关键词
visual language large model; cross-modal fusion; image recognition; agricultural knowledge understanding;
D O I
10.3390/app14188350
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Smart agriculture has become an inevitable trend in the development of modern agriculture, especially promoted by the continuous progress of large language models like chat generative pre-trained transformer (ChatGPT) and general language model (ChatGLM). Although these large models perform well in general knowledge learning, they still have certain limitations and errors when facing agricultural professional knowledge about crop disease identification, growth stage judgment, and so on. Agricultural data involves images and texts and other modalities, which play an important role in agricultural production and management. In order to better learn the characteristics of different modal data in agriculture, realize cross-modal data fusion, and thus understand complex application scenarios, we propose a framework AgriVLM that uses a large amount of agricultural data to fine-tune the visual language model to analyze agricultural data. It can fuse multimodal data and provide more comprehensive agricultural decision support. Specifically, it utilizes Q-former as a bridge between an image encoder and a language model to achieve a cross-modal fusion of agricultural images and text data. Then, we apply a Low-Rank adaptive to fine-tune the language model to achieve an alignment between agricultural image features and a pre-trained language model. The experimental results prove that AgriVLM demonstrates great performance in crop disease recognition and growth stage recognition, with recognition accuracy exceeding 90%, demonstrating its capability to analyze different modalities of agricultural data.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] GeoLLM: A specialized large language model framework for intelligent geotechnical design
    Xu, Hao-Ruo
    Zhang, Ning
    Yin, Zhen-Yu
    Njock, Pierre Guy Atangana
    COMPUTERS AND GEOTECHNICS, 2025, 177
  • [2] An Intelligent Industrial Visual Monitoring and Maintenance Framework Empowered by Large-Scale Visual and Language Models
    Wang, Huan
    Li, Chenxi
    Li, Yan-Fu
    Tsung, Fugee
    IEEE Transactions on Industrial Cyber-Physical Systems, 2024, 2 : 166 - 175
  • [3] LUNA: A Model-Based Universal Analysis Framework for Large Language Models
    Song, Da
    Xie, Xuan
    Song, Jiayang
    Zhu, Derui
    Huang, Yuheng
    Felix, Juefei-Xu
    Ma, Lei
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (07) : 1921 - 1948
  • [4] Large Language Model Based Intelligent Interaction for Digital Human
    Ma, Xiaoying
    Peng, Yuya
    Zhang, Yingxue
    Si, Zhanjun
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT VI, ICIC 2024, 2024, 14880 : 204 - 211
  • [5] MapReader: a framework for learning a visual language model for map analysis
    Zhang, Yifan
    Zhang, Wenbo
    Zeng, Ziyi
    Jiang, Keying
    Li, Jingxuan
    Min, Wen
    Luo, Wei
    Guan, Qingfeng
    Lin, Jianfeng
    Yu, Wenhao
    INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2025,
  • [6] Artificially Intelligent Billing in Spine Surgery: An Analysis of a Large Language Model
    Kong, Xiuhua
    Wang, Lingling
    Liu, Changhua
    GLOBAL SPINE JOURNAL, 2024, 14 (05) : 1684 - 1684
  • [7] Artificially Intelligent Billing in Spine Surgery: An Analysis of a Large Language Model
    Zaidat, Bashar
    Lahoti, Yash S.
    Yu, Alexander
    Mohamed, Kareem S.
    Cho, Samuel K.
    Kim, Jun S.
    GLOBAL SPINE JOURNAL, 2023,
  • [8] Intelligent Security Q&A System Based on Large Language Model
    Zhou, Youtao
    Lu, Qiuhong
    Fan, Haoyu
    Xiao, Yuntao
    Hu, Jinwen
    Zhang, Shimian
    2024 3RD INTERNATIONAL CONFERENCE ON ROBOTICS, ARTIFICIAL INTELLIGENCE AND INTELLIGENT CONTROL, RAIIC 2024, 2024, : 271 - 275
  • [9] Knowledge graph of agricultural engineering technology based on large language model
    Wang, Haowen
    Zhao, Ruixue
    DISPLAYS, 2024, 85
  • [10] Large-Scale Visual Language Model Boosted by Contrast Domain Adaptation for Intelligent Industrial Visual Monitoring
    Wang, Huan
    Li, Chenxi
    Li, Yan-Fu
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, : 14114 - 14123