Parameterization before Meta-Analysis: Cross-Modal Embedding Clustering for Forest Ecology Question-Answering

被引:0
|
作者
Tao, Rui [1 ,2 ]
Zhu, Meng [3 ]
Cao, Haiyan [2 ]
Ren, Hong-E [1 ,4 ]
机构
[1] Northeast Forestry Univ, Coll Comp & Control Engn, Harbin 150040, Peoples R China
[2] Hulunbuir Univ, Coll Artificial Intelligence & Big Data, Hulunbuir 021008, Peoples R China
[3] Harbin Univ, Coll Informat Engn, Harbin 150076, Peoples R China
[4] Heilongjiang Forestry Intelligent Equipment Engn R, Harbin 150040, Peoples R China
来源
FORESTS | 2024年 / 15卷 / 09期
关键词
forestry ecology; meta-analysis; cross-modal; question-answering; embedding clustering;
D O I
10.3390/f15091670
中图分类号
S7 [林业];
学科分类号
0829 ; 0907 ;
摘要
In the field of forestry ecology, image data capture factual information, while literature is rich with expert knowledge. The corpus within the literature can provide expert-level annotations for images, and the visual information within images naturally serves as a clustering center for the textual corpus. However, both image data and literature represent large and rapidly growing, unstructured datasets of heterogeneous modalities. To address this challenge, we propose cross-modal embedding clustering, a method that parameterizes these datasets using a deep learning model with relatively few annotated samples. This approach offers a means to retrieve relevant factual information and expert knowledge from the database of images and literature through a question-answering mechanism. Specifically, we align images and literature across modalities using a pair of encoders, followed by cross-modal information fusion, and feed these data into an autoregressive generative language model for question-answering with user feedback. Experiments demonstrate that this cross-modal clustering method enhances the performance of image recognition, cross-modal retrieval, and cross-modal question-answering models. Our method achieves superior performance on standardized tasks in public datasets for image recognition, cross-modal retrieval, and cross-modal question-answering, notably achieving a 21.94% improvement in performance on the cross-modal question-answering task of the ScienceQA dataset, thereby validating the efficacy of our approach. Essentially, our method targets cross-modal information fusion, combining perspectives from multiple tasks and utilizing cross-modal representation clustering of images and text. This approach effectively addresses the interdisciplinary complexity of forestry ecology literature and the parameterization of unstructured heterogeneous data encapsulating species diversity in conservation images. Building on this foundation, intelligent methods are employed to leverage large-scale data, providing an intelligent research assistant tool for conducting forestry ecological studies on larger temporal and spatial scales.
引用
收藏
页数:30
相关论文
共 3 条
  • [1] Properties of cross-modal occipital responses in early blindness: An ALE meta-analysis
    Zhang, Caiyun
    Lee, Tatia M. C.
    Fu, Yunwei
    Ren, Chaoran
    Chan, Chetwyn C. H.
    Tao, Qian
    NEUROIMAGE-CLINICAL, 2019, 24
  • [2] Modality Dependent Cross-Modal Functional Reorganization Following Congenital Visual Deprivation within Occipital Areas: A Meta-Analysis of Tactile and Auditory Studies
    Ricciardi, Emiliano
    Tozzi, Leonardo
    Leo, Andrea
    Pietrini, Pietro
    MULTISENSORY RESEARCH, 2014, 27 (3-4) : 247 - 262
  • [3] Cross-modal associations of human body odour attractiveness with facial and vocal attractiveness provide little support for the backup signals hypothesis: A systematic review and meta-analysis
    Trebicky, Vit
    Delplanque, Sylvain
    Ferdenzi, Camille
    Fink, Bernhard
    Jelinkova, Lucie
    Patkova, Zaneta
    Roberts, S. Craig
    Roeder, Susanne
    Saxton, Tamsin K.
    Schwambergova, Dagmar
    Sterbova, Zuzana
    Fialova, Jitka Trebicka
    Havlicek, Jan
    EVOLUTION AND HUMAN BEHAVIOR, 2023, 44 (01) : 19 - 29