Analyzing Relatedness by Toponym Co-Occurrences on Web Pages

被引:83
|
作者
Liu, Yu [1 ]
Wang, Fahui [2 ]
Kang, Chaogui [1 ,4 ]
Gao, Yong [1 ]
Lu, Yongmei [3 ]
机构
[1] Peking Univ, Beijing 100871, Peoples R China
[2] Louisiana State Univ, Baton Rouge, LA 70803 USA
[3] Texas State Univ, San Marcos, TX 78666 USA
[4] MIT, Cambridge, MA 02139 USA
关键词
FLOW-DATA; 1ST LAW; KNOWLEDGE; GIS; CHINA; WORLD;
D O I
10.1111/tgis.12023
中图分类号
P9 [自然地理学]; K9 [地理];
学科分类号
0705 ; 070501 ;
摘要
This research proposes a method for capturing relatedness between geographical entities based on the co-occurrences of their names on web pages. The basic assumption is that a higher count of co-occurrences of two geographical places implies a stronger relatedness between them. The spatial structure of China at the provincial level is explored from the co-occurrences of two provincial units in one document, extracted by a web information retrieval engine. Analysis on the co-occurrences and topological distances between all pairs of provinces indicates that: (1) spatially close provinces generally have similar co-occurrence patterns; (2) the frequency of co-occurrences exhibits a power law distance decay effect with the exponent of 0.2; and (3) the co-occurrence matrix can be used to capture the similarity/linkage between neighboring provinces and fed into a regionalization method to examine the spatial organization of China. The proposed method provides a promising approach to extracting valuable geographical information from massive web pages.
引用
收藏
页码:89 / 107
页数:19
相关论文
共 50 条