Outlier Detection Based Accurate Geocoding of Historical Addresses

被引:4
|
作者
Kirielle, Nishadi [1 ]
Christen, Peter [1 ]
Ranbaduge, Thilina [1 ]
机构
[1] Australian Natl Univ, Res Sch Comp Sci, Canberra, ACT 2600, Australia
来源
DATA MINING, AUSDM 2019 | 2019年 / 1127卷
关键词
Geocode matching; String comparison; Open Street Map; STREET;
D O I
10.1007/978-981-15-1699-3_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Research in the social sciences is increasingly based on large and complex databases, such as historical birth, marriage, death, and census records. Such databases can be analyzed individually to investigate, for example, changes in education, health, and emigration over time. Many of these historical databases contain addresses, and assigning geographical locations (latitude and longitude), the process known as geocoding, will provide the foundation to facilitate a wide range of studies based on spatial data analysis. Furthermore, geocoded records can be employed to enhance record linkage processes, where family trees for whole populations can be constructed. However, a challenging aspect when geocoding historical addresses is that these might have changed over time and therefore are only partially or not at all available in modern geocoding systems. In this paper, we present a novel method to geocode historical addresses where we use an online geocoding service to initially retrieve geocodes for historical addresses. For those addresses where multiple geocodes are returned, we employ outlier detection to improve the accuracy of locations assigned to addresses, while for addresses where no geocode was found, for example due to spelling variations, we employ approximate string matching to identify the most likely correct spelling along with the corresponding geocode. Experiments on two real historical data sets, one from Scotland and the other from Finland, show that our method can reduce the number of addresses with multiple geocodes by over 80% and increase the number of addresses from no to a single geocode by up to 31% compared to an online geocoding service.
引用
收藏
页码:41 / 53
页数:13
相关论文
共 50 条
  • [31] Community Outlier Based Fraudster Detection
    Sun, Chenfei
    Li, Qingzhong
    Li, Hui
    Zhang, Shidong
    Zheng, Yongqing
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT (KSEM 2017): 10TH INTERNATIONAL CONFERENCE, KSEM 2017, MELBOURNE, VIC, AUSTRALIA, AUGUST 19-20, 2017, PROCEEDINGS, 2017, 10412 : 410 - 421
  • [32] Outlier detection based on transitive closure
    Wan, Jiaqiang
    Zhu, Qingsheng
    Lei, Dajiang
    Lu, Jiaxi
    INTELLIGENT DATA ANALYSIS, 2015, 19 (01) : 145 - 160
  • [33] Triangle-based outlier detection
    Navarro, Jorge
    Martin de Diego, Isaac
    Fernandez, Ruben R.
    Moguerza, Javier M.
    PATTERN RECOGNITION LETTERS, 2022, 156 : 152 - 159
  • [34] Fluctuation-based outlier detection
    Du, Xusheng
    Zuo, Enguang
    Chu, Zheng
    He, Zhenzhen
    Yu, Jiong
    SCIENTIFIC REPORTS, 2023, 13 (01):
  • [35] Cluster-based outlier detection
    Duan, Lian
    Xu, Lida
    Liu, Ying
    Lee, Jun
    ANNALS OF OPERATIONS RESEARCH, 2009, 168 (01) : 151 - 168
  • [36] Outlier Detection Based on Granular Computing
    Chen, Yuming
    Miao, Duoqian
    Wang, Ruizhi
    ROUGH SETS AND CURRENT TRENDS IN COMPUTING, PROCEEDINGS, 2008, 5306 : 283 - 292
  • [37] Outlier Detection Based on Voronoi Diagram
    Qu, Jilin
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2008, 5139 : 516 - 523
  • [38] Pruning Based Method for Outlier Detection
    Pamula, Rajendra
    Deka, Jatindra Kumar
    Nandi, Sukumar
    2012 THIRD INTERNATIONAL CONFERENCE ON EMERGING APPLICATIONS OF INFORMATION TECHNOLOGY (EAIT), 2012, : 210 - 213
  • [39] Cluster-based outlier detection
    Lian Duan
    Lida Xu
    Ying Liu
    Jun Lee
    Annals of Operations Research, 2009, 168 : 151 - 168
  • [40] Outlier detection based on neighborhood chain
    Liang S.-Y.
    Han D.-Q.
    Kongzhi yu Juece/Control and Decision, 2019, 34 (07): : 1433 - 1440