Supporting web-based address extraction with unsupervised tagging

被引:3
|
作者
Loos, Berenike [1 ]
Biemann, Chris [2 ]
机构
[1] European Media Lab GmbH, D-69118 Heidelberg, Germany
[2] Univ Leipzig, NLP Dept, D-04103 Leipzig, Germany
关键词
D O I
10.1007/978-3-540-78246-9_68
中图分类号
F [经济];
学科分类号
02 ;
摘要
The manual acquisition and modeling of tourist information as e.g. addresses of points of interest is time and, therefore, cost intensive. Furthermore, the encoded information is static and has to be refined for newly emerging sight seeing objects, restaurants or hotels. Automatic acquisition can support and enhance the manual acquisition and can be implemented as a run-time approach to obtain information not encoded in the data or knowledge base of a tourist information system. In our work we apply unsupervised learning to the challenge of web-based address extraction from plain text data extracted from web pages dealing with locations and containing the addresses of those. The data is processed by an unsupervised part-of-speech tagger (Biemann, 2006a), which constructs domain-specific categories via distributional similarity of stop word contexts and neighboring content words. In the address domain, separate tags for street names, locations and other address parts can be observed. To extract the addresses, we apply a Conditional Random Field (CRF) on a labeled training set of addresses, using the unsupervised tags as features. Evaluation on a gold standard of correctly annotated data shows that unsupervised learning combined with state of the art machine learning is a viable approach to support web-based information extraction, as it results in improved extraction quality as compared to omitting the unsupervised tagger.
引用
收藏
页码:577 / +
页数:3
相关论文
共 50 条
  • [31] Supervised and unsupervised Web-based language model domain adaptation
    Lecorve, Gwenole
    Dines, John
    Hain, Thomas
    Motlicek, Petr
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 182 - 185
  • [32] Strategies to address participant misrepresentation for eligibility in Web-based research
    Kramer, Jessica
    Rubin, Amy
    Coster, Wendy
    Helmuth, Eric
    Hermos, John
    Rosenbloom, David
    Moed, Rich
    Dooley, Meghan
    Kao, Ying-Chia
    Liljenquist, Kendra
    Brief, Deborah
    Enggasser, Justin
    Keane, Terence
    Roy, Monica
    Lachowicz, Mark
    INTERNATIONAL JOURNAL OF METHODS IN PSYCHIATRIC RESEARCH, 2014, 23 (01) : 120 - 129
  • [33] Address extraction: Extraction of location-based information from the web
    Cai, WT
    Wang, SR
    Jiang, QS
    WEB TECHNOLOGIES RESEARCH AND DEVELOPMENT - APWEB 2005, 2005, 3399 : 925 - 937
  • [34] A SOA-based Model Supporting Adaptive Web-based Applications
    Ardissono, L.
    Furnari, R.
    Goy, A.
    Petrone, G.
    Segnan, M.
    2008 3RD INTERNATIONAL CONFERENCE ON INTERNET AND WEB APPLICATIONS AND SERVICES (ICIW 2008), 2008, : 708 - 713
  • [35] The web as a baseline: Evaluating the performance of unsupervised web-based models for a range of NLP tasks
    Lapata, M
    Keller, F
    HLT-NAACL 2004: HUMAN LANGUAGE TECHNOLOGY CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE MAIN CONFERENCE, 2004, : 121 - 128
  • [36] Supporting Creativity Within Web-based Self-services
    Gerber, Elizabeth M.
    Martin, Caitlin K.
    INTERNATIONAL JOURNAL OF DESIGN, 2012, 6 (01): : 85 - 100
  • [37] A Web-based system for supporting structured collaboration in the public sector
    Karacapilidis, N
    Loukis, E
    Dimopoulos, S
    ELECTRONIC GOVERNMENT, PROCEEDINGS, 2004, 3183 : 218 - 225
  • [38] Supporting virtual enterprise design by a web-based information model
    Li, D
    Barn, B
    McKay, A
    INTERNET-BASED ENTERPRISE INTEGRATION AND MANAGEMENT, 2001, 4566 : 30 - 40
  • [39] Models and Framework for Supporting Runtime Decisions in Web-Based Systems
    Andreolini, Mauro
    Casolari, Sara
    Colajanni, Michele
    ACM TRANSACTIONS ON THE WEB, 2008, 2 (03)
  • [40] Supporting web-based collaboration between virtual enterprise partners
    Bright, D
    Quirchmayr, G
    15TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2004, : 1029 - 1035