Parameterization before Meta-Analysis: Cross-Modal Embedding Clustering for Forest Ecology Question-Answering

被引：0

作者：

Tao, Rui ^{[1
,2
]}

Zhu, Meng ^{[3
]}

Cao, Haiyan ^{[2
]}

Ren, Hong-E ^{[1
,4
]}

机构：

[1] Northeast Forestry Univ, Coll Comp & Control Engn, Harbin 150040, Peoples R China

[2] Hulunbuir Univ, Coll Artificial Intelligence & Big Data, Hulunbuir 021008, Peoples R China

[3] Harbin Univ, Coll Informat Engn, Harbin 150076, Peoples R China

[4] Heilongjiang Forestry Intelligent Equipment Engn R, Harbin 150040, Peoples R China

来源：

FORESTS | 2024年 / 15卷 / 09期

关键词：

forestry ecology; meta-analysis; cross-modal; question-answering; embedding clustering;

D O I：

10.3390/f15091670

中图分类号：

S7 [林业];

学科分类号：

0829 ; 0907 ;

摘要：

In the field of forestry ecology, image data capture factual information, while literature is rich with expert knowledge. The corpus within the literature can provide expert-level annotations for images, and the visual information within images naturally serves as a clustering center for the textual corpus. However, both image data and literature represent large and rapidly growing, unstructured datasets of heterogeneous modalities. To address this challenge, we propose cross-modal embedding clustering, a method that parameterizes these datasets using a deep learning model with relatively few annotated samples. This approach offers a means to retrieve relevant factual information and expert knowledge from the database of images and literature through a question-answering mechanism. Specifically, we align images and literature across modalities using a pair of encoders, followed by cross-modal information fusion, and feed these data into an autoregressive generative language model for question-answering with user feedback. Experiments demonstrate that this cross-modal clustering method enhances the performance of image recognition, cross-modal retrieval, and cross-modal question-answering models. Our method achieves superior performance on standardized tasks in public datasets for image recognition, cross-modal retrieval, and cross-modal question-answering, notably achieving a 21.94% improvement in performance on the cross-modal question-answering task of the ScienceQA dataset, thereby validating the efficacy of our approach. Essentially, our method targets cross-modal information fusion, combining perspectives from multiple tasks and utilizing cross-modal representation clustering of images and text. This approach effectively addresses the interdisciplinary complexity of forestry ecology literature and the parameterization of unstructured heterogeneous data encapsulating species diversity in conservation images. Building on this foundation, intelligent methods are employed to leverage large-scale data, providing an intelligent research assistant tool for conducting forestry ecological studies on larger temporal and spatial scales.

引用

页数：30

共 3 条

[1] Properties of cross-modal occipital responses in early blindness: An ALE meta-analysis
Zhang, Caiyun
Lee, Tatia M. C.
Fu, Yunwei
Ren, Chaoran
Chan, Chetwyn C. H.
Tao, Qian
NEUROIMAGE-CLINICAL, 2019, 24
[2] Modality Dependent Cross-Modal Functional Reorganization Following Congenital Visual Deprivation within Occipital Areas: A Meta-Analysis of Tactile and Auditory Studies
Ricciardi, Emiliano
Tozzi, Leonardo
Leo, Andrea
Pietrini, Pietro
MULTISENSORY RESEARCH, 2014, 27 (3-4) : 247 - 262
[3] Cross-modal associations of human body odour attractiveness with facial and vocal attractiveness provide little support for the backup signals hypothesis: A systematic review and meta-analysis
Trebicky, Vit
Delplanque, Sylvain
Ferdenzi, Camille
Fink, Bernhard
Jelinkova, Lucie
Patkova, Zaneta
Roberts, S. Craig
Roeder, Susanne
Saxton, Tamsin K.
Schwambergova, Dagmar
Sterbova, Zuzana
Fialova, Jitka Trebicka
Havlicek, Jan
EVOLUTION AND HUMAN BEHAVIOR, 2023, 44 (01) : 19 - 29

← 1 →