Supporting web-based address extraction with unsupervised tagging

被引:3
|
作者
Loos, Berenike [1 ]
Biemann, Chris [2 ]
机构
[1] European Media Lab GmbH, D-69118 Heidelberg, Germany
[2] Univ Leipzig, NLP Dept, D-04103 Leipzig, Germany
关键词
D O I
10.1007/978-3-540-78246-9_68
中图分类号
F [经济];
学科分类号
02 ;
摘要
The manual acquisition and modeling of tourist information as e.g. addresses of points of interest is time and, therefore, cost intensive. Furthermore, the encoded information is static and has to be refined for newly emerging sight seeing objects, restaurants or hotels. Automatic acquisition can support and enhance the manual acquisition and can be implemented as a run-time approach to obtain information not encoded in the data or knowledge base of a tourist information system. In our work we apply unsupervised learning to the challenge of web-based address extraction from plain text data extracted from web pages dealing with locations and containing the addresses of those. The data is processed by an unsupervised part-of-speech tagger (Biemann, 2006a), which constructs domain-specific categories via distributional similarity of stop word contexts and neighboring content words. In the address domain, separate tags for street names, locations and other address parts can be observed. To extract the addresses, we apply a Conditional Random Field (CRF) on a labeled training set of addresses, using the unsupervised tags as features. Evaluation on a gold standard of correctly annotated data shows that unsupervised learning combined with state of the art machine learning is a viable approach to support web-based information extraction, as it results in improved extraction quality as compared to omitting the unsupervised tagger.
引用
收藏
页码:577 / +
页数:3
相关论文
共 50 条
  • [41] Supporting dynamic interactions among Web-based information sources
    Bouguettaya, A
    Benatallah, B
    Hendra, L
    Ouzzani, M
    Beard, J
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2000, 12 (05) : 779 - 801
  • [42] A Web-Based Knowledge Network for Supporting Emerging Internet Applications
    Lee M.
    Su S.Y.W.
    Lam H.
    World Wide Web, 2001, 4 (1-2) : 121 - 140
  • [43] Web-Based Knowledge Database Construction Method for Supporting Design
    Takahashi, Kiyotaka
    Sugiyama, Aki
    Shimomura, Yoshiki
    Tateyama, Takeshi
    Chiba, Ryosuke
    Yoshioka, Masaharu
    Takeda, Hideaki
    PRACTICAL ASPECTS OF KNOWLEDGE MANAGEMENT, PROCEEDINGS, 2008, 5345 : 173 - +
  • [44] An implementation of Web-based interactive integrative learning supporting system
    Tanaka, M
    Matsuo, T
    Tashiro, N
    Nishi, K
    Ito, U
    Shintani, T
    IC-AI '04 & MLMTA'04 , VOL 1 AND 2, PROCEEDINGS, 2004, : 765 - 769
  • [45] Towards a Web-Based Platform Supporting the Recomposition of Business Processes
    Wisniewski, Piotr
    Bujak, Agata
    Kluza, Krzysztof
    Suchenia, Anna
    Zaremba, Mateusz
    Jemiolo, Pawel
    Ligeza, Antoni
    INFORMATION TECHNOLOGY FOR MANAGEMENT: BUSINESS AND SOCIAL ISSUES, ISM 2021, 2022, 442 : 166 - 185
  • [46] A progressive content distribution framework in supporting web-based learning
    Li, FWB
    Lau, RWH
    ADVANCES IN WEB-BASED LEARNING - ICWL 2004, 2004, 3143 : 75 - 82
  • [47] Supporting the interaction between user and web-based multimedia information
    Bianchi-Berthouze, N
    Katsumi, N
    Yoneyama, H
    Bhalla, S
    Izumita, T
    IEEE/WIC INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, 2003, : 593 - 596
  • [48] Supporting web-based scholarship through index cards and annotations
    Sánchez, JA
    Flores, LA
    Kirschning, I
    Ostróvskaya, Y
    THIRD LATIN AMERICAN WEB CONGRESS, PROCEEDINGS, 2005, : 54 - 57
  • [49] Supporting Virtual Reality in an Adaptive Web-Based Learning Environment
    De Troyer, Olga
    Kleinermann, Frederic
    Pellens, Bram
    Ewais, Ahmed
    LEARNING IN THE SYNERGY OF MULTIPLE DISCIPLINES, PROCEEDINGS, 2009, 5794 : 627 - 632
  • [50] ACOTA: A MULTILINGUAL AND SEMI-AUTOMATIC COLLABORATIVE TAGGING WEB-BASED APPROACH
    Luis Alvargonzalez, Cesar
    Maria Alvarez-Rodrigez, Jose
    Labra Gayo, Jose Emilio
    Ordonez de Pablos, Patricia
    JOURNAL OF WEB ENGINEERING, 2014, 13 (1-2): : 160 - 180