Web Information Extraction for content augmentation

被引:0
|
作者
Janevski, A [1 ]
Dimitrova, N [1 ]
机构
[1] Philips Res USA, Briarcliff Manor, NY 10510 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Today users have to cope with an overwhelming amount of TV channels and Web content sources. We introduce automatic content augmentation, as a novel approach to contextual information extraction on behalf of the user where the context is provided by the primary content source (i.e. TV channel) and tailored by user's preferences. A key aspect of this approach is Web Information Extraction (WebIE) which automatically derives structured information from unstructured Web documents. Our system executes WebIE tasks, each an instantiation of WebIE rules - our generic document processors. We present two WebIE approaches: Diffusion WebIE that crawls a wide set of Web pages and extracts information from a subset of the pertinent pages; and Laser WebIE that accesses a select set of Web pages and extracts narrowly defined information. We describe the architecture and the implementation details of the system and provide detailed Laser WebIE examples.
引用
收藏
页码:A389 / A392
页数:4
相关论文
共 50 条
  • [1] Exploiting Content Redundancy for Web Information Extraction
    Gulhane, Pankaj
    Rastogi, Rajeev
    Sengamedu, Srinivasan H.
    Tengli, Ashwin
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (01): : 578 - 587
  • [2] Web Content Information Extraction Based on DOM Tree and Statistical Information
    Yu, Xin
    Jin, Zhengping
    [J]. 2017 17TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT 2017), 2017, : 1308 - 1311
  • [3] Content Information Extraction of Theme Web Pages based on Tag Information
    Wang, Jie
    Wu, Jian
    Zhang, Yafeng
    He, Guowan
    [J]. 2014 SEVENTH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2014), VOL 1, 2014, : 501 - 504
  • [4] INFORMATION EXTRACTION VERSUS TEXT SEGMENTATION FOR WEB CONTENT MINING
    Fragkou, Pavlina
    [J]. INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2013, 23 (08) : 1109 - 1137
  • [5] Web Table Extraction, Retrieval and Augmentation
    Zhang, Shuo
    Balog, Krisztian
    [J]. PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 1409 - 1410
  • [6] A web page content information extraction method based on tag window
    Zhao, Xin-Xin
    Suo, Hong-Guang
    Liu, Yu-Shu
    [J]. Proceedings of 2006 International Conference on Machine Learning and Cybernetics, Vols 1-7, 2006, : 1598 - 1601
  • [7] Extraction of Context Information from Web Content Using Entity Linking
    Hirata, Norifumi
    Shiramatsu, Shun
    Ozono, Tadachika
    Shintani, Toramatsu
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2013, 13 (02): : 18 - 23
  • [8] Web Table Extraction, Retrieval, and Augmentation: A Survey
    Zhang, Shuo
    Balog, Krisztian
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2020, 11 (02)
  • [9] MedIEQ - Quality Labelling of Medical Web Content Using Multilingual Information Extraction
    Angel Mayer, Miquel
    Karkaletsis, Vangelis
    Stamatakis, Kostas
    Leis, Angela
    Villarroel, Dagmar
    Thomeczek, Christian
    Labsky, Martin
    Lopez-Ostenero, Fernando
    Honkela, Timo
    [J]. MEDICAL AND CARE COMPUNETICS 3, 2006, 121 : 183 - +
  • [10] Extraction of Web Content Based on Content Type
    Verma, Manish Kumar
    Kumar, Sarowar
    Abhishek, Kumar
    Singh, M. P.
    [J]. PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ICT FOR SUSTAINABLE DEVELOPMENT, ICT4SD 2015, VOL 1, 2016, 408 : 105 - 113