Supporting web-based address extraction with unsupervised tagging

被引:3
|
作者
Loos, Berenike [1 ]
Biemann, Chris [2 ]
机构
[1] European Media Lab GmbH, D-69118 Heidelberg, Germany
[2] Univ Leipzig, NLP Dept, D-04103 Leipzig, Germany
关键词
D O I
10.1007/978-3-540-78246-9_68
中图分类号
F [经济];
学科分类号
02 ;
摘要
The manual acquisition and modeling of tourist information as e.g. addresses of points of interest is time and, therefore, cost intensive. Furthermore, the encoded information is static and has to be refined for newly emerging sight seeing objects, restaurants or hotels. Automatic acquisition can support and enhance the manual acquisition and can be implemented as a run-time approach to obtain information not encoded in the data or knowledge base of a tourist information system. In our work we apply unsupervised learning to the challenge of web-based address extraction from plain text data extracted from web pages dealing with locations and containing the addresses of those. The data is processed by an unsupervised part-of-speech tagger (Biemann, 2006a), which constructs domain-specific categories via distributional similarity of stop word contexts and neighboring content words. In the address domain, separate tags for street names, locations and other address parts can be observed. To extract the addresses, we apply a Conditional Random Field (CRF) on a labeled training set of addresses, using the unsupervised tags as features. Evaluation on a gold standard of correctly annotated data shows that unsupervised learning combined with state of the art machine learning is a viable approach to support web-based information extraction, as it results in improved extraction quality as compared to omitting the unsupervised tagger.
引用
下载
收藏
页码:577 / +
页数:3
相关论文
共 50 条
  • [1] Unsupervised Web-based Automatic Annotation
    Millan, Miquel
    Sanchez, David
    Moreno, Antonio
    STAIRS 2008, 2008, 179 : 118 - 129
  • [2] A Geo-Tagging Framework for Address Extraction from Web Pages
    Efremova, Julia
    Endres, Ian
    Vidas, Isaac
    Melnik, Ofer
    ADVANCES IN DATA MINING: APPLICATIONS AND THEORETICAL ASPECTS (ICDM 2018), 2018, 10933 : 288 - 295
  • [3] Web-based assessment tests supporting learning
    Encheva, S
    Tumin, S
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2005 WORKSHOPS, PROCEEDINGS, 2005, 3807 : 134 - 143
  • [4] Supporting on demand collaboration in web-based communities
    Paal, Stefan
    Broecker, Lars
    Borowski, Marion
    SEVENTEENTH INTERNATIONAL CONFERENCE ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2006, : 293 - +
  • [5] A review of Web-based simulation and supporting tools
    Byrne, James
    Heavey, Cathal
    Byrne, P. J.
    SIMULATION MODELLING PRACTICE AND THEORY, 2010, 18 (03) : 253 - 276
  • [6] Supporting on demand collaboration in web-based communities
    Fraunhofer Institute for Media Communication, Schloss Birlinghoven, D-53754 Sankt Augustin, Germany
    et al.; FMR; ForTIA; McMaster University; Microsoft Research; SAP, 1600, 293-298 (2006):
  • [7] Supporting Web-based database application development
    Xia, Q
    Feng, L
    Lu, HJ
    6TH INTERNATIONAL CONFERENCE ON DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 1999, : 17 - 24
  • [8] Web-based Geospatial Information Extraction
    Kahler, Bart
    Jones, K. C.
    Bacher, Brian
    PROCEEDINGS OF THE 2012 IEEE NATIONAL AEROSPACE AND ELECTRONICS CONFERENCE (NAECON), 2012, : 46 - 50
  • [9] Web-Based Information Extraction Technology
    孙铁利
    教巍巍
    刘淑华
    Journal of Donghua University(English Edition), 2007, (02) : 288 - 292
  • [10] A web-based game for supporting game-based learning
    Dziabenko, O
    Pivec, M
    Bouras, C
    Igglesis, V
    Kapoulas, V
    Misedakis, I
    GAME-ON 2003: 4th International Conference on Intelligent Games and Simulation, 2003, : 111 - 118