Outlier Detection Based Accurate Geocoding of Historical Addresses

被引:4
|
作者
Kirielle, Nishadi [1 ]
Christen, Peter [1 ]
Ranbaduge, Thilina [1 ]
机构
[1] Australian Natl Univ, Res Sch Comp Sci, Canberra, ACT 2600, Australia
来源
DATA MINING, AUSDM 2019 | 2019年 / 1127卷
关键词
Geocode matching; String comparison; Open Street Map; STREET;
D O I
10.1007/978-981-15-1699-3_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Research in the social sciences is increasingly based on large and complex databases, such as historical birth, marriage, death, and census records. Such databases can be analyzed individually to investigate, for example, changes in education, health, and emigration over time. Many of these historical databases contain addresses, and assigning geographical locations (latitude and longitude), the process known as geocoding, will provide the foundation to facilitate a wide range of studies based on spatial data analysis. Furthermore, geocoded records can be employed to enhance record linkage processes, where family trees for whole populations can be constructed. However, a challenging aspect when geocoding historical addresses is that these might have changed over time and therefore are only partially or not at all available in modern geocoding systems. In this paper, we present a novel method to geocode historical addresses where we use an online geocoding service to initially retrieve geocodes for historical addresses. For those addresses where multiple geocodes are returned, we employ outlier detection to improve the accuracy of locations assigned to addresses, while for addresses where no geocode was found, for example due to spelling variations, we employ approximate string matching to identify the most likely correct spelling along with the corresponding geocode. Experiments on two real historical data sets, one from Scotland and the other from Finland, show that our method can reduce the number of addresses with multiple geocodes by over 80% and increase the number of addresses from no to a single geocode by up to 31% compared to an online geocoding service.
引用
收藏
页码:41 / 53
页数:13
相关论文
共 50 条
  • [1] An Ontology Based Prototype for Geocoding Offset Addresses
    Ding, Linfang
    Zhang, Xuehu
    Wei, Ran
    Ma, Haoming
    Li, Qi
    2009 WRI WORLD CONGRESS ON SOFTWARE ENGINEERING, VOL 2, PROCEEDINGS, 2009, : 239 - 243
  • [2] Geocoding patient addresses for biosurveillance
    Olson, KL
    Mandl, KD
    AMIA 2002 SYMPOSIUM, PROCEEDINGS: BIOMEDICAL INFORMATICS: ONE DISCIPLINE, 2002, : 1119 - 1119
  • [3] A graph-based approach for representing addresses in geocoding
    Zhang, Chen
    He, Biao
    Guo, Renzhong
    Ma, Ding
    COMPUTERS ENVIRONMENT AND URBAN SYSTEMS, 2023, 100
  • [4] GIS-based geocoding methods for area-based addresses and 3D addresses in urban areas
    Lee, Jiyeong
    ENVIRONMENT AND PLANNING B-PLANNING & DESIGN, 2009, 36 (01): : 86 - 106
  • [5] Positional error in automated geocoding of residential addresses
    Michael R Cayo
    Thomas O Talbot
    International Journal of Health Geographics, 2 (1)
  • [6] Geocoding addresses from a large population-based study: Lessons learned
    McElroy, JA
    Remington, PL
    Trentham-Dietz, A
    Robert, SA
    Newcomb, PA
    EPIDEMIOLOGY, 2003, 14 (04) : 399 - 407
  • [7] Historical geocoding assistant
    Mertel, Adam
    Zbiral, David
    Stachon, Zdenek
    Horinkova, Hana
    SOFTWAREX, 2021, 14
  • [8] Historical Collaborative Geocoding
    Cura, Remi
    Dumenieu, Bertrand
    Abadie, Nathalie
    Costes, Benoit
    Perret, Julien
    Gribaudi, Maurizio
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2018, 7 (07)
  • [9] Error and bias in geocoding school and students' home addresses
    Whitsel, Eric A.
    ENVIRONMENTAL HEALTH PERSPECTIVES, 2008, 116 (08) : A330 - A330
  • [10] A Comparative Study of Cluster Based Outlier Detection, Distance Based Outlier Detection and Density Based Outlier Detection Techniques
    Mandhare, Harshada C.
    Idate, S. R.
    2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2017, : 931 - 935