Outlier Detection Based Accurate Geocoding of Historical Addresses

被引:4
|
作者
Kirielle, Nishadi [1 ]
Christen, Peter [1 ]
Ranbaduge, Thilina [1 ]
机构
[1] Australian Natl Univ, Res Sch Comp Sci, Canberra, ACT 2600, Australia
来源
DATA MINING, AUSDM 2019 | 2019年 / 1127卷
关键词
Geocode matching; String comparison; Open Street Map; STREET;
D O I
10.1007/978-981-15-1699-3_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Research in the social sciences is increasingly based on large and complex databases, such as historical birth, marriage, death, and census records. Such databases can be analyzed individually to investigate, for example, changes in education, health, and emigration over time. Many of these historical databases contain addresses, and assigning geographical locations (latitude and longitude), the process known as geocoding, will provide the foundation to facilitate a wide range of studies based on spatial data analysis. Furthermore, geocoded records can be employed to enhance record linkage processes, where family trees for whole populations can be constructed. However, a challenging aspect when geocoding historical addresses is that these might have changed over time and therefore are only partially or not at all available in modern geocoding systems. In this paper, we present a novel method to geocode historical addresses where we use an online geocoding service to initially retrieve geocodes for historical addresses. For those addresses where multiple geocodes are returned, we employ outlier detection to improve the accuracy of locations assigned to addresses, while for addresses where no geocode was found, for example due to spelling variations, we employ approximate string matching to identify the most likely correct spelling along with the corresponding geocode. Experiments on two real historical data sets, one from Scotland and the other from Finland, show that our method can reduce the number of addresses with multiple geocodes by over 80% and increase the number of addresses from no to a single geocode by up to 31% compared to an online geocoding service.
引用
收藏
页码:41 / 53
页数:13
相关论文
共 50 条
  • [41] Outlier Detection Based on the Data Structure
    Guo, Feng
    Shi, Canghong
    Li, Xiaojie
    He, Jia
    Wu, Xi
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [42] Local Subspace Based Outlier Detection
    Agrawal, Ankur
    CONTEMPORARY COMPUTING, PROCEEDINGS, 2009, 40 : 149 - 157
  • [43] Algorithm based on partition for outlier detection
    School of Information Science and Engineering, Northeastern University, Shenyang 110006, China
    不详
    Ruan Jian Xue Bao, 2006, 5 (1009-1016):
  • [44] ADDRESSES: HISTORICAL, POLITICAL, SOCIOLOGICAL
    不详
    HARVARD LAW REVIEW, 1906, 19 (04) : 318 - 318
  • [45] ADDRESSES: HISTORICAL - POLITICAL - SOCIOLOGICAL
    不详
    AMERICAN LAW REGISTER, 1906, 54 (01): : 58 - 59
  • [46] An Intrusion Detection Method Based on Outlier Ensemble Detection
    Huang, Bin
    Li, Wen-fang
    Chen, De-li
    Shi, Liang
    NSWCTC 2009: INTERNATIONAL CONFERENCE ON NETWORKS SECURITY, WIRELESS COMMUNICATIONS AND TRUSTED COMPUTING, VOL 2, PROCEEDINGS, 2009, : 600 - +
  • [47] Cell-based outlier detection algorithm: A fast outlier detection algorithm for large datasets
    Wan, You
    Bian, Fuling
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2008, 5012 : 1042 - 1048
  • [48] TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees
    Mai, Uyen
    Mirarab, Siavash
    BMC GENOMICS, 2018, 19
  • [49] A parametric and non-parametric approach for high-accurate outlier detection
    Bah M.J.
    Wang H.
    Journal of Information Science and Engineering, 2020, 36 (02): : 441 - 465
  • [50] Positional accuracy of geocoding from residential postal codes versus full street addresses
    Khan, Saeeda
    Pinault, Lauren
    Tjepkema, Michael
    Wilkins, Russell
    HEALTH REPORTS, 2018, 29 (02) : 3 - 9