M2T: A Framework of Spatial Scene Description Text Generation based on Multi-source Knowledge Graph Fusion

被引：0

作者：

Chen H. ^{[1
,2
]}

Guo D. ^{[3
]}

Ge S. ^{[1
,2
]}

Wang J. ^{[1
]}

Wang Y. ^{[1
,2
]}

Chen F. ^{[4
]}

Yang W. ^{[5
,6
]}

机构：

[1] Computer Network Information Center, Chinese Academy of Sciences, Beijing

[2] University of Chinese Academy of Sciences, Beijing

[3] Spatial Temporal Data Intelligence Research Lab, College of Information Sciences and technology, Beijing University of Chemical Technology, Beijing

[4] Department of East Asian Studies, The University of Arizona, Tucson

[5] School of Urban Planning and Design, Peking University, Shenzhen

[6] Center for Shenzhen Natural Resources and Real Estate Evaluation and Development Research, Shenzhen

来源：

Journal of Geo-Information Science | 2023年 / 25卷 / 06期

基金：

中国国家自然科学基金;

关键词：

geographic knowledge graph; natural language generation; spatial attention; spatial cognition; spatial expression; spatial scene description; spatial understanding;

D O I：

10.12082/dqxxkx.2023.230034

中图分类号：

学科分类号：

摘要：

Natural language is an effective tool for humans to describe things, with diversity and ease of dissemination, and can contain human spatial cognitive results. How to use natural language to describe geographic spatial scenes has always been an important research direction in spatial cognition and geographic information science, providing important application values in personalized unmanned tour guides, blind navigation, virtual space scene interpretation, and so on. The essence of natural language description of geographic spatial scenes is the process of transforming the two-dimensional vector of geographic space into a one-dimensional vector in word space. Traditional models perform well in handling spatial relationships, but are somewhat inadequate in natural language description: (1) spatial relationship description models are one-way descriptions of the environment by humans, without considering the impact of the environment on the description; (2) spatial scenes emphasize traversal-based descriptions of spatial relationships, where each set of spatial relationships is equally weighted, which is inconsistent with the varying attention paid by humans to geographic entities and spatial relationships in the environment; (3) the spatial relationship calculation of traditional models is a static description of a single scene, which is difficult to meet the requirement of dynamic description of continuous scenes in practical applications; (4) the natural language style of traditional models is mechanical, lacking necessary knowledge support. This article proposes a spatial scene natural language generation framework Map2Text (M2T) that integrates multiple knowledge graphs. The framework establishes knowledge graphs for spatial relationships, language generation style, and spatial attention, respectively, and realizes the fusion of multiple knowledge graphs and the generation of natural language descriptions of spatial scenes within a unified framework. The spatial scene description knowledge graph solves the pruning problem of traversing spatial relationships, and establishes the relationship between spatial scenes by building a spatial relationship graph, supporting continuous expression of spatial scenes; the natural language style knowledge graph establishes the relationship between spatial expression and language style, achieving diversified language styles that are appropriate for spatial natural language expression; the spatial attention knowledge graph captures the nuances of natural language spatial expression by establishing an attention matrix based on the interaction state between the subject and object of the spatial scene. An experimental prototype system designed based on the Beijing Forbidden City demonstrates that the system-generated results are close to human travel notes, with more complete content coverage and more diverse styles, verifying the effectiveness of the M2T framework and demonstrating the potential value of natural language description of spatial scenes. © 2023 Journal of Geo-information Science. All rights reserved.

引用

页码：1176 / 1185

页数：9

共 18 条

[1] Du Q Y, Ren F., Representation model of spatial information in natural language, Geomatics and Information Science of Wuhan University, 39, 6, pp. 682-688, (2014)
[2] Guo D H, Ge S Y, Zhang S, Et al., DeepSSN: A deep convolutional neural network to assess spatial scene similarity[J], Transactions in GIS, 26, 4, pp. 1914-1938, (2022)
[3] Guo D H., Geospatial analysis based on spatial scene similarity, (2016)
[4] Zhao W, Wu J N., Talking about the development history of map, Urban Geotechnical Investigation & Surveying, 5, pp. 111-116, (2022)
[5] Ma Y F, Li J Y., Study on schemes mode of Tourists'Geospatial cognition, Journal of Remote Sensing, 12, 2, pp. 378-384, (2008)
[6] Wang X M, Liu Y, Zhang J., Geo- spatial cognition: an overview, Geography and Geo- Information Science, 21, 6, pp. 1-10, (2005)
[7] Zhang W F., Semantic analysis for cross- media data, (2019)
[8] Vinyals O, Toshev A, Bengio S, Et al., Show and tell: A neural image caption generator[C], Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3156-3164, (2015)
[9] Karpathy A, Li F F., Deep visual- semantic alignments for generating image descriptions[C], Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3128-3137, (2015)
[10] Simonyan K, Zisserman A., Very deep convolutional networks for large- scale image recognition, (2014)

← 1 2 →