Mining Spatio-temporal Data on Industrialization from Historical Registries

被引:11
|
作者
Berenbaum, D. [1 ]
Deighan, D. [1 ]
Marlow, T. [2 ]
Lee, A. [1 ]
Frickel, S. [2 ]
Howison, M. [1 ]
机构
[1] Brown Univ, Comp & Informat Serv, Data Sci Practice, 3 Davol Sq, Providence, RI 02912 USA
[2] Brown Univ, Inst Brown Environm & Soc, 80 Waterman St, Providence, RI 02912 USA
关键词
structured text; historical data; geocoding; page layout analysis; socio-environmental analysis; LAND-USE CONVERSIONS; WASTE; URBANIZATION; CLUSTERS;
D O I
10.3808/jei.201700381
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Despite the growing availability of big data in many fields, historical data on socio-evironmental phenomena are often not available due to a lack of automated and scalable approaches for collecting, digitizing, and assembling them. We have developed a data-mining method for extracting tabulated, geocoded data from printed directories. While scanning and optical character recognition (OCR) can digitize printed text, these methods alone do not capture the structure of the underlying data. Our pipeline integrates both page layout analysis and OCR to extract tabular, geocoded data from structured text. We demonstrate the utility of this method by applying it to scanned manufacturing registries from Rhode Island that record 41 years of industrial land use. The resulting spatio-temporal data can be used for socio-environmental analyses of industrialization at a resolution that was not previously possible. In particular, we find strong evidence for the dispersion of manufacturing from the urban core of Providence, the state's capital, along the Interstate 95 corridor to the north and south.
引用
收藏
页码:28 / 34
页数:7
相关论文
共 50 条
  • [41] Integrated spatio-temporal data mining for forest fire prediction
    Department of Civil, Environmental and Geomatic Engineering, University College London, Gower Street, London WC1E 6BT, United Kingdom
    不详
    [J]. Trans. GIS, 2008, 5 (591-611): : 591 - 611
  • [42] A Spatio-Temporal Linked Data Representation for Modeling Spatio-Temporal Dialect Data
    Scholz, Johannes
    Hrastnig, Emanual
    Wandl-Vogt, Eveline
    [J]. PROCEEDINGS OF WORKSHOPS AND POSTERS AT THE 13TH INTERNATIONAL CONFERENCE ON SPATIAL INFORMATION THEORY (COSIT 2017), 2018, : 275 - 282
  • [43] Mining Trajectories for Spatio-temporal Analytics
    Xing, Songhua
    Liu, Xuan
    He, Qing
    Hampapur, Arun
    [J]. 12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012), 2012, : 910 - 913
  • [44] Mining generalized spatio-temporal patterns
    Wang, JM
    Hsu, WN
    Lee, ML
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2005, 3453 : 649 - 661
  • [45] Spatio-temporal Mining with Scene Data Integration for Urban Transportation Navigation
    Wen, Rong
    Yan, Wenjing
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 3175 - 3179
  • [46] Housing price variations using spatio-temporal data mining techniques
    Soltani, Ali
    Pettit, Christopher James
    Heydari, Mohammad
    Aghaei, Fatemeh
    [J]. JOURNAL OF HOUSING AND THE BUILT ENVIRONMENT, 2021, 36 (03) : 1199 - 1227
  • [47] Big spatio-temporal data mining for emergency management information systems
    Dagaeva, Maria
    Garaeva, Alina
    Anikin, Igor
    Makhmutova, Alisa
    Minnikhanov, Rifkat
    [J]. IET INTELLIGENT TRANSPORT SYSTEMS, 2019, 13 (11) : 1649 - 1657
  • [48] Housing price variations using spatio-temporal data mining techniques
    Ali Soltani
    Christopher James Pettit
    Mohammad Heydari
    Fatemeh Aghaei
    [J]. Journal of Housing and the Built Environment, 2021, 36 : 1199 - 1227
  • [49] Mining Persistent and Dynamic Spatio-Temporal Change in Global Climate Data
    Lian, Jie
    McGuire, Michael P.
    [J]. INFORMATION TECHNOLOGY: NEW GENERATIONS, 2016, 448 : 881 - 891
  • [50] The application of the spatio-temporal data mining algorithm in maize yield prediction
    Cao, Liying
    San, Xiaohui
    Zhao, Yueling
    Chen, Guifen
    [J]. MATHEMATICAL AND COMPUTER MODELLING, 2013, 58 (3-4) : 507 - 513